This document provides an overview of Kaitai Struct project from a developer’s perspective: general architecture, infrastructure, etc.
1. Compiler’s architecture
The heart of Kaitai Struct project is obviously a reference compiler. As all compilers, it "compiles" or "translates" input files (Kaitai Struct YAML, .ksy) into output files (source code in target programming languages, like C++, Java, etc).
In order to do this translation, compiler performs several major steps:
-
Loading of YAML files and parsing them into in-memory tree of objects
-
"Pre-compilation" - a set of preparatory actions (such as type inferring, resolving names, compile-time sanity checks), which are the same for all target languages
-
"Compilation" - traversal of the KSY object tree in certain order, rendering source code in target language by application of certain "templates"
2. Entry point
Before these 3 keys steps are performed, there is always some entry point to get us started. There are multiple, platform-dependent entry points in the compiler:
-
JVM: io.kaitai.struct.JavaMain does:
-
command-line argument parsing, which results in a
CLIConfigobject -
runs JavaMain.run with it
-
runs JavaMain.compileOneInput for every .ksy source file specified, doing lots of wrapping to properly handle regular output and exceptions
-
-
JS: io.kaitai.struct.MainJs has the main
compilemethod
3. 3-step compilation process
Current implementation bundles steps 1 and 2 into one invocation:
-
For JVM, it is implemented in
JavaKSYParser.localFileToSpecs -
For JS, it is implemented in
JavaScriptKSYParser.yamlToSpecs- although note that whole JS is heavily async, so it returnsFuture[ClassSpecs], notClassSpecsobject itself.
3.1. Loading and parsing of YAML files
The aim of this stage is, given a list of file names to load
(typically, these would be .ksy files) to load them, parse them as
YAML, and convert them to all into elements of
io.kaitai.struct.format - typically various *Spec things, like
ClassSpec, AttributeSpec, etc. End result is a single ClassSpecs
object, which incorporates one or many ClassSpec which define user
types.
3.1.1. Loading YAML files
Loading YAML files is currently, unfortunately, done by external library in platform-specific manner:
-
For JVM, it calls SnakeYAML. Everything related to this step is encapsulated in
JavaKSYParser. -
For JS, it calls: TODO
We’re working on bringing pure Scala YAML parser, but it’s a relatively distant goal: see #229.
3.1.2. Parsing
Conversion from YAML objects to Spec-objects is typically performed by
invocation of ClassSpec.fromYaml.
3.1.3. Loading imports
After we’ve parsed YAML, we can recursively load all files mentioned in imports. The biggest catch in this process is that it is effectively reading more disk files + running more YAML parsing, i.e. it’s a step back, again into platform-dependent territory.
3.2. Precompilation
The aim of this stage is to do preparatory language-independent
actions. The whole step is invoked from Main.precompile, but
individual substeps are implemented by classes in
io.kaitai.struct.precompile. Please refer to per-class documentation
(if it exists) to every particular step.
Precompilation modifies (enriches) existing ClassSpecs object. Alternatively, it might throw an exception if some of the validation checks failed (TODO: exception structure).
3.3. Compilation
Compilation is a final step, which converts enriched ClassSpecs into source code in target language.
This task is obviously dependent on target language, thus it is performed by language-specific class.
There are 2 main variations of implementing these:
-
Classes which inherit
AbstractCompilerdirectly, such asGraphVizCompiler, do everything from scratch and organize generation flow in some arbitrary manner. -
Most traditional languages (such as Java, Ruby, Python, C++, etc), which has something in common, use ready-made
ClassCompiler, which is a simple skeleton for compiling something resembling typical understanding of a "class" with members/methods. To introduce language-specific behavior, one can:-
Provide implementation of
LanguageCompiler, which is acts like a template with many different simple methods like "start new file", "start new class", "finish a class", etc. -
Subclass
ClassCompiler(likeGoClassCompiler), overriding some of the control flow.
-
3.3.1. Translators
During compilation process, we occasionally need to do translation of
KS expression language (which is target language-agnostic) into actual
target language snippets. "Translators" are per-language classes
reside in io.kaitai.struct.translators which implement that
translation. The most important method they provide is translate,
which gets KS expression and is expected to return a string in target
language.
TODO: explain about translators which do not generated only a string (i.e. Go).