Version 0.3, 2026-03-16
The Concise Data Identifer (CDI) standard provides a common means to declare data formats in binary files and in memory. The identifiers are 32 bit words which allows for quick and easy comparison at run time. Values for widely used formats are defined and can be used together with private, application specific values.
The standard also defines a few data types and a couple of container formats.
A Concise Data Identifier is a 4 byte, big endian value containing 3 fields. The fields are the CDI Fingerprint, the data type Category, and a format Selector within the category. The Category and Selector together declare a Format.
| Fingerprint (DA7A) | Category | Selector |
|---|---|---|
| 16 | 4 | 12 |
The official list of assigned public indentifiers is the cdi.h header.
The assigned categories are as follows:
| # | Data Type | Example Formats |
|---|---|---|
| 0 | Text | 0000 ASCII, 0001 UTF8, 0002 HTML, 0003 Markdown |
| 1 | Image | 1000 PPM, 1001 PBM, 1002 PNG, 1003 JPEG, 1004 SVG |
| 2 | Audio | 2000 RAW-S8, 2001 RAW-S16, 2006 WAVE |
| 3 | Video | |
| 4 | Bytecode | |
| 5 | Geometry | 5000 OBJ, 5001 glTF, 5002 STL |
| 6 | Animation | 6000 IQM |
| 7 | Container | 7000 CDI-Pak, 7001 CDI-Chunk, 7002 IFF, 7003 MP4 |
| 8 | Archive | 8000 zip, 8001 lha, 8002 bz2 |
| 9 | State | |
| A | ||
| B | ||
| C | ||
| D | ||
| E | ||
| F | Private |
The State category is for application images, saved game data, or other binary cache data.
The Private category is for experimental or application specific data.
The 12 bit selector allows 4032 universal formats to be defined within each category.
The selector range 0xFC0-0xFFF is reserved
for 64 application specific data types within each category.
This file container format defines a series of data chunks followed by a table of contents (TOC). It is similar to the WAD format used in the Doom games.
The CDI value for a CDI Package is DA7A7000.
All offest and length fields are 32 bit, little endian values. All offset values are in bytes from the start of the file. All length values are in bytes.
| DA7A 7000 | App_ID | TOC_Off | TOC_Len |
|---|---|---|---|
| 32 | 32 | 32 | 32 |
The header App_ID field denotes the types of private CDI
values used within the package. If this field is zero then no private
CDI values may be present in the package.
The table of contents contains an array of CDIEntry and/or
CDIEntryPath structures. If the first two bytes of an entry are
DA7A then it is a CDIEntry, otherwise it will have the path
fields.
| CDI | App_ID | Offset | Length |
|---|---|---|---|
| 32 | 32 | 32 | 32 |
The TOC entry App_ID field is a unique, application
defined identifer for the chunk.
| Path1 | Format | Path2 | FileSI | Offset | Length |
|---|---|---|---|---|---|
| 16 | 16 | 16 | 16 | 32 | 32 |
The Path1, Path2, & FileSI
fields are little endian indices into a Text String Table chunk with an
App_ID of “PATH” (50415448). A Path1 or Path2 value of zero
indicates the respective path node is not used.
This file container format defines a series of data chunks.
The CDI value for the CDI Chunk format is DA7A7001.
All length fields are 32 bit, little endian values.
| DA7A 7001 | App_ID | Length |
|---|---|---|
| 32 | 32 | 32 |
The header App_ID field denotes the types of private CDI
values used within all following chunks. If this field is zero then no
private CDI values may be present in any chunks.
The Length field is the total number of bytes of all
following chunks, excluding any terminator chunk.
| CDI | Length |
|---|---|
| 32 | 32 |
| Zero | Ignored |
|---|---|
| 32 | 32 |
This format defines a block of UTF-8, NUL terminated strings. It can be used for symbol tables.
The CDI value for the String Table format is
DA7A0006.
There are three defined forms that string tables may take. All forms start with the same 32 bit header.
| Form | String_Count |
|---|---|
| 8 | 24 |
| # | Description | Data Layout |
|---|---|---|
| 0 | No Index | [Header] [Strings] |
| 1 | 16 Bit Index | [Header] [Index] [Strings] |
| 2 | 32 Bit Index | [Header] [Index] [Strings] |
The index is an array of unsigned 16 or 32 bit, little endian values of the byte offsets into the strings section.
Here is the binary data for a form 1 table with three strings of “Gems”, “Hard Stop”, & “facet”:
00000000: 0100 0003 0000 0500 0f00 4765 6d73 0048 ..........Gems.H
00000010: 6172 6420 5374 6f70 0066 6163 6574 00 ard Stop.facet.
This format defines a series of 8 bit color values with alpha. It can be used for storing image CLUTs.
The CDI value for the RGBA format is DA7A100F.
The number of color entries present is specified elsewhere, such as in a Chunk Header.
| Red | Green | Blue | Alpha |
|---|---|---|---|
| 8 | 8 | 8 | 8 |