Invocation

Quick start

Parsing from local file:

#include <fstream>
#include <kaitai/kaitaistream.h>

std::ifstream is("path/to/local/file.dat", std::ifstream::binary);
kaitai::kstream ks(&is);
example_t data(&ks);

Parsing from std::string:

#include <fstream>
#include <kaitai/kaitaistream.h>
#include <sstream>

std::string buf;
std::istringstream is(buf);
kaitai::kstream ks(&is);
example_t data(&ks);

Auto-read

By default, invoking constructor with a stream argument assumes that you want to run parsing process and populate object’s fields with the data read from the stream:

example_t data(&ks);
data.some_attribute(); // already populated and available

However, sometimes you want more control and want to trigger that process manually. In that case, you can supply --no-auto-read argument for kaitai-struct-compiler, and you’ll have to trigger reading manually using _read() invocation:

example_t data(&ks);
data.some_attribute(); // not yet populated, may contain random garbage
data._read();
data.some_attribute(); // populated and available

There are several reasons why you’d want to separate object creation and object population:

  • Obviously, if you’re using read-write mode with the intent to populate the object manually and call _write() afterwards to serialize it. In this use-case, you probably won’t call _read() at all.

  • You want to reuse the same object and thus want to repeatedly call _read() on several times manually.

  • If you’re not using smart pointers (see below), any exception (data validation, IO error, etc) during object creation might result in memory leak, as destructor will NOT be executed.

Ownership model

In all modes, Kaitai Struct follows the same ownership model:

  • If something is created during the parsing process, it belongs to the enclosing object which represents the user type. As soon as enclosing object will be deleted, it will take care of deletion (and cleanup) of all owned attributes.

  • If something is assigned to a user type using set…​() methods for serialization process, after set…​() is completed, user type assumes ownership of everything you’ve passed through set…​().

  • Everything else passed in a constructor and/or any other invocations, is not owned by user type (and will not be cleaned up automatically), namely:

  • root object reference/pointer

  • parent object reference/pointer

  • IO object

  • type parameters

To illustrate these principles, using the following .ksy spec:

meta:
  id: example
seq:
  - id: foo
    type: block
types:
  block:
    seq:
      - id: bar
        type: u1

You would always do reading this way:

// On stack
{
    kaitai::kstruct in_stream(...);  // belongs here
    example_t example(&in_stream);   // belongs here

    // assumes auto-read; in case of --no-auto-read, add:
    // example._read();

    example_t::block_t* foo = example->foo();
    // does not belong here, belongs to user type

    // can use `foo` here:
    int foo_bar = foo->bar();

    // but must not delete it:
    // delete foo; // ILLEGAL: will result in double free

    // example & in_stream get deleted here as they go out of scope
}

// On heap
{
    kaitai::kstruct* in_stream = new kaitai::kstruct(...);  // belongs here
    example_t* example = new example_t(in_stream);          // belongs here

    // assumes auto-read; in case of --no-auto-read, add:
    // example->_read();

    example_t::block_t* foo = example->foo();
    // does not belong here, belongs to user type

    // can use `foo` here:
    int foo_bar = foo->bar();

    // but must not delete it:
    // delete foo; // ILLEGAL: will result in double free

    // delete in reverse order to order of creation
    delete example;
    // also deletes `foo`, so `foo` should not be used after this point

    // int foo_bar_2 = foo->bar(); // ILLEGAL: foo pointer is already deleted

    delete in_stream;
}

Pointers model

TODO: raw pointers, unique+raw pointers

Primitive type mapping

Mapping KS types to C++ is pretty straight-forward:

type C++ type

no type

std::string

u1

uint8_t

u2

uint16_t

u4

uint32_t

u8

uint64_t

s1

int8_t

s2

int16_t

s4

int32_t

s8

int64_t

str, strz

std::string

Note that both byte arrays and strings are mapped to std::string — that’s because when we store byte array, we need something that would be able to both hold the byte buffer and store it’s length (or at least able to derive it).

String encoding

There’s no universal agreement on dealing with encodings in C++, so KS allows you to choose one of the few popular approaches. You can choose how to deal with string encoding using a compile-time define.

  • KS_STR_ENCODING_NONE: Ignore encodings at all. In this mode, all string parsing operations just ignore any encoding specifications and pass raw bytes as a string to application. Note that in some cases it might break some .ksy files that actually depend on string being properly decoded / converted.

  • Convert all incoming byte streams into strings in a single, one-size-fits-all encoding (for example, UTF8, as suggested by UTF8 Everywhere Manifesto). Since there’s no universal way to do it, KS would use one of platform-dependent ways (which can be also enforced by specifying specific defines):

    • KS_STR_ENCODING_ICONV: Use POSIX iconv library — usually preinstalled (or included in libc) on all POSIX systems, can be linked as external library on most other systems (i.e. Windows)

    • (not implemented yet) Use Windows API functions MultiByteToWideChar and WideCharToMultiByte — obviously, available only on Windows platform

    • (not implemented yet) Use ICU library

Null values

In certain cases, namely when using if with an expression that will be false, a certain attribute won’t be parsed. For example:

seq:
  - id: foo
    type: u1
  - id: bar
    type: u1
    if: foo == 42

If foo is not 42, then an unsigned 1-byte integer bar won’t be parsed. By general convention, Kaitai Struct makes sure that bar is equal to a null value, to be able to distinguish such a situation (as opposed to having some value). However, it’s not possible to do so for many primitive (non-reference) types in C++. In the example above, bar will have type uint8_t, and assigning null to it would just set it to 0, thus we won’t be able to distinguish a situation when bar was read and we’ve got 0, and bar wasn’t read.

To work around this situation, Kaitai Struct generates special null checking methods for every attribute that can be null: _is_null_ATTRIBUTE, where ATTRIBUTE is the name of the attribute. Thus, the proper way to use such nullable values is something like:

if (!r->_is_null_bar()) {
    uint8_t bar = r->bar();
    // `bar` is defined, use `bar` here
} else {
    // `bar` is null because of failed `if` comparison
    // note that accessing r->bar() will return an uninitialized value
    // (i.e. random garbage)
}