Version 0.2, 2021-05-26
The Concise Data Identifer (CDI) standard provides a common means to declare data formats in binary files and in memory. The identifiers are 32 bit words which allows for quick and easy comparison at run time. Values for widely used formats are defined and can be used together with private, application specific values.
The standard also defines a few data types and a couple of container formats.
A Concise Data Identifier is a 4 byte, big endian value containing 3 fields. The fields are the CDI Fingerprint, the data type Category, and a format Selector within the category. The Category and Selector together declare a Format.
Fingerprint (DA7A) | Category | Selector |
---|---|---|
16 | 4 | 12 |
The official list of assigned public indentifiers is the cdi.h header.
The assigned categories are as follows:
# | Data Type | Example Formats |
---|---|---|
0 | Text | 0000 ASCII, 0001 UTF8, 0002 HTML, 0003 Markdown |
1 | Image | 1000 PPM, 1001 PBM, 1002 PNG, 1003 JPEG, 1004 SVG |
2 | Audio | 2000 RAW-S8, 2001 RAW-S16, 2006 WAVE |
3 | Video | |
4 | Bytecode | |
5 | Geometry | 5000 OBJ, 5001 glTF, 5002 STL |
6 | Animation | 6000 IQM |
7 | Container | 7000 CDI-Pak, 7001 CDI-Chunk, 7002 IFF, 7003 MP4 |
8 | Archive | 8000 zip, 8001 lha, 8002 bz2 |
9 | ||
A | ||
B | ||
C | ||
D | ||
E | ||
F | Private |
The Private category is for experimental or application specific data.
The 12 bit selector allows 4032 universal formats to be defined within each category.
The selector range 0xFC0
-0xFFF
is reserved for 64 application specific data types within each category.
This file container format defines a series of data chunks followed by a table of contents. It is similar to the WAD format used in the Doom games, but is more suited as a universal container.
The CDI value for a CDI Package is DA7A7000
.
All offest and length fields are 32 bit, little endian values. All offset values are in bytes from the start of the file. All length values are in bytes.
DA7A 7000 | App_ID | TOC_Off | TOC_Len |
---|---|---|---|
32 | 32 | 32 | 32 |
The header App_ID
field denotes the types of private CDI values used within the package. If this field is zero then no private CDI values may be present in the package.
CDI | App_ID | Offset | Length |
---|---|---|---|
32 | 32 | 32 | 32 |
The TOC entry App_ID
field is a unique, application defined identifer for the chunk.
This file container format defines a series of data chunks.
The CDI value for the CDI Chunk format is DA7A7001
.
All length fields are 32 bit, little endian values.
DA7A 7001 | App_ID | Length |
---|---|---|
32 | 32 | 32 |
The header App_ID
field denotes the types of private CDI values used within all following chunks. If this field is zero then no private CDI values may be present in any chunks.
The Length
field is the total number of bytes of all following chunks, excluding any terminator chunk.
CDI | Length |
---|---|
32 | 32 |
Zero | Ignored |
---|---|
32 | 32 |
This format defines a block of UTF-8, NUL terminated strings. It can be used for symbol tables.
The CDI value for the String Table format is DA7A0006
.
There are three defined forms that string tables may take. All forms start with the same 32 bit header.
Form | String_Count |
---|---|
8 | 24 |
# | Description | Data Layout |
---|---|---|
0 | No Index | [Header] [Strings] |
1 | 16 Bit Index | [Header] [Index] [Strings] |
2 | 32 Bit Index | [Header] [Index] [Strings] |
The index is an array of unsigned 16 or 32 bit, little endian values of the byte offsets into the strings section.
Here is the binary data for a form 1 table with three strings of “Gems”, “Hard Stop”, & “facet”:
00000000: 0100 0003 0000 0500 0f00 4765 6d73 0048 ..........Gems.H
00000010: 6172 6420 5374 6f70 0066 6163 6574 00 ard Stop.facet.
This format defines a series of 8 bit color values with alpha. It can be used for storing image CLUTs.
The CDI value for the RGBA format is DA7A1006
.
The number of color entries present is specified elsewhere, such as in a Chunk Header.
Red | Green | Blue | Alpha |
---|---|---|---|
8 | 8 | 8 | 8 |