Introduction: What Happens Under the Hood When a DICOM File Is Written
Every DICOM file is, ultimately, a sequence of bytes on disk. Above that byte sequence, the standard layers a structure of tags, lengths, value representations, and values that lets diverse imaging systems exchange studies reliably. But the rules for how those structures are serialized — the byte order, the placement of the VR, the size of the length field — are governed by a small set of transfer syntaxes. For radiology IT engineers debugging migrations, building integration tools, or analyzing why a PACS rejects a study, understanding how DICOM tags are encoded on disk is invaluable.
This article walks through the binary anatomy of a DICOM tag, explains the difference between little-endian and big-endian byte order, contrasts implicit and explicit VR encodings, and shows what the file preamble and DICM marker look like. By the end you will be able to look at a hex dump of a DICOM file and identify each tag, length, and value. To inspect any DICOM file at the metadata level (without going to hex), use our DICOM Tag Viewer or browse the data dictionary in the DICOM Tag Browser.
For a higher-level reference on tag groups and meanings, see Understanding DICOM Tags. For the related topic of the data type system, read DICOM VR types reference. And for the de-identification angle on vendor-specific tags, see DICOM private tags explained.
The DICOM File Preamble and DICM Magic Number
A standard DICOM Part 10 file begins with a fixed structure:
- 128 bytes of zero-filled preamble. Originally intended to allow the file to start with arbitrary application-specific data; in practice almost always all zero.
- 4 bytes of magic number:
DICMin ASCII. This is how readers confirm a file is in DICOM Part 10 format. - The File Meta Information group (group 0002), which is always encoded in Explicit VR Little Endian regardless of the transfer syntax of the rest of the file.
- The dataset, encoded according to the transfer syntax declared in
(0002,0010)Transfer Syntax UID.
The 128-byte preamble exists for historical reasons. The DICM magic at offset 128 is the universal way file readers detect DICOM. Some tooling skips the preamble check; some intermediate systems strip it, leading to files that are technically valid datasets but not Part 10 conformant.
Byte Order: Little Endian Dominates
Computers can store multi-byte numbers in two orders:
- Little-endian: The least significant byte comes first. The 16-bit number
0x0010is stored as10 00on disk. - Big-endian: The most significant byte comes first. The same number is stored as
00 10.
DICOM defines transfer syntaxes for both byte orders, but in practice little-endian dominates. The default transfer syntax in DICOM is 1.2.840.10008.1.2, which is Implicit VR Little Endian. The most commonly used explicit-VR syntax is 1.2.840.10008.1.2.1, Explicit VR Little Endian. Big-endian transfer syntaxes exist but are rare and were retired from new use in 2006.
So when you read a DICOM tag (0010,0010) on disk, the bytes you see are 10 00 10 00: group 0010 as little-endian (10 00) followed by element 0010 as little-endian (10 00). Recognizing this pattern is the first skill of reading DICOM in a hex editor.
Implicit VR Little Endian: The Compact Format
In Implicit VR Little Endian (transfer syntax UID 1.2.840.10008.1.2), each data element is encoded as:
[group: 2 bytes LE] [element: 2 bytes LE] [length: 4 bytes LE] [value: bytes]
The VR is not stored in the file. To know whether the value is text, an integer, or a sequence, the reader must look up the tag in the public DICOM data dictionary.
For example, the tag (0010,0010) Patient’s Name with the value “DOE^JOHN^A” (10 characters) is encoded as:
10 00 10 00 <- tag (0010,0010) little-endian
0A 00 00 00 <- length 10 little-endian
44 4F 45 5E 4A 4F 48 4E 5E 41 <- "DOE^JOHN^A" in ASCII
Implicit VR is compact (no two extra bytes per element for the VR) but cannot describe private tags whose VR is unknown to the reader. This is the major weakness of the format and a frequent cause of failures when private elements travel between systems.
Explicit VR Little Endian: The Modern Default
In Explicit VR Little Endian (transfer syntax UID 1.2.840.10008.1.2.1), each data element places the two-letter VR after the tag, before the length. The length field size depends on the VR:
- Short-form VRs (most string and small numeric types — LO, SH, CS, UI, PN, DA, TM, US, UL, FL, FD, IS, DS, AS, AE, AT, etc.) use a 2-byte length:
[group: 2] [element: 2] [VR: 2] [length: 2] [value:] - Long-form VRs (OB, OW, OF, OD, OL, SQ, UT, UC, UR, UN) use 2 reserved bytes followed by a 4-byte length:
[group: 2] [element: 2] [VR: 2] [reserved: 2] [length: 4] [value:]
The same Patient’s Name encoded in Explicit VR Little Endian becomes:
10 00 10 00 <- tag (0010,0010)
50 4E <- VR "PN" in ASCII
0A 00 <- length 10 (2-byte form)
44 4F 45 5E 4A 4F 48 4E 5E 41 <- "DOE^JOHN^A"
Explicit VR is two bytes larger per element but solves the private-tag problem: any reader can determine the data type without needing the source system’s dictionary. This is why Explicit VR Little Endian is recommended for any DICOM that will be shared across vendors.
The File Meta Information Group: Always Explicit VR Little Endian
Group 0002 — the File Meta Information — is special. Regardless of what transfer syntax the rest of the file uses, group 0002 is always encoded in Explicit VR Little Endian. This is so any reader can find (0002,0010) Transfer Syntax UID at the start of the file and determine how to decode the rest.
Group 0002 contains essential metadata:
(0002,0001)File Meta Information Version (OB, fixed to00 01)(0002,0002)Media Storage SOP Class UID (UI)(0002,0003)Media Storage SOP Instance UID (UI)(0002,0010)Transfer Syntax UID (UI)(0002,0012)Implementation Class UID (UI)(0002,0013)Implementation Version Name (SH)
If you ever see a file that violates this rule — group 0002 not in Explicit VR Little Endian — it is non-conformant and many readers will refuse to parse it.
Length Fields: Defined, Undefined, and Sequences
The length field tells the reader how many bytes the value occupies. Most elements use defined lengths: an exact byte count.
A few special cases use undefined length, encoded as FFFFFFFF:
- Sequences (SQ): A sequence may have undefined length, in which case it is terminated by a Sequence Delimitation Item tag
(FFFE,E0DD). - Items inside a sequence: Each item starts with
(FFFE,E000). Items can be defined-length or undefined-length (terminated by(FFFE,E00D)). - Encapsulated Pixel Data: Pixel Data
(7FE0,0010)in compressed transfer syntaxes uses undefined length and is followed by Basic Offset Table and individual fragments, terminated by Sequence Delimitation Item.
Misreading these special markers is a common cause of parser failures. (FFFE,E000), (FFFE,E00D), and (FFFE,E0DD) are not regular elements — they are control structures within sequences and encapsulated data.
Compressed Transfer Syntaxes
For images, DICOM defines transfer syntaxes that compress the pixel data while keeping the metadata in Explicit VR Little Endian:
- JPEG Baseline (
1.2.840.10008.1.2.4.50): Lossy 8-bit JPEG. Common for older archives. - JPEG Lossless (
1.2.840.10008.1.2.4.70): Mathematically lossless JPEG. Common for radiography. - JPEG 2000 Lossless (
1.2.840.10008.1.2.4.90): JPEG 2000 in lossless mode. - RLE Lossless (
1.2.840.10008.1.2.5): Run-length encoding, very lightweight. - HTJ2K (
1.2.840.10008.1.2.4.201): High-Throughput JPEG 2000, increasingly common.
In all of these, only the Pixel Data element is compressed; everything else is plain Explicit VR Little Endian. So if you can read the metadata of an uncompressed file, you can read the metadata of a compressed file too.
Reading a DICOM File in a Hex Editor: A Worked Example
Suppose you open a DICOM file in a hex editor and see (after the 128-byte preamble and DICM at offset 128):
0000 0080: 44 49 43 4D DICM
0000 0084: 02 00 00 00 55 4C 04 00 BC 00 00 00 ..UL.....
0000 0090: 02 00 02 00 55 49 1A 00 31 2E 32 2E 38 34 30 ....UI..1.2.840
0000 009F: 2E 31 30 30 30 38 2E 35 2E 31 2E 34 2E 31 2E 32 .10008.5.1.4.1.2
Reading byte by byte:
- Offset 0x84:
02 00 00 00= tag (0002, 0000) (group length). - Then VR
UL, length 4, valueBC 00 00 00= 188 bytes of group 0002 follow. - Offset 0x90:
02 00 02 00= tag (0002, 0002) Media Storage SOP Class UID. - VR
UI, length1A 00= 26 bytes. - Value:
1.2.840.10008.5.1.4.1.2— truncated here but you can already see the dotted-decimal UID.
With this skill you can verify DICOM file structure when a viewer fails, identify where parsing breaks, and confirm whether a file is corrupted at the byte level or only at the metadata level.
Encoding Pitfalls in Practice
- Odd-length string values must be padded. DICOM requires every value field to have an even length. String VRs pad with a trailing space (
0x20) or, for UI, a trailing null (0x00). Tools that forget this produce non-conformant files. - Wrong VR for a tag. If a writer encodes Patient’s Name with VR
LOinstead ofPNin Explicit VR, strict readers will reject it. - Group 0002 in implicit VR. Readers expect group 0002 in Explicit VR Little Endian and may fail before reaching the rest of the file.
- Mixed transfer syntax inside a single file. The dataset must be uniform after group 0002. Concatenating fragments from different transfer syntaxes produces a corrupt file.
- Misinterpreted item delimiters. Tools that treat
(FFFE,E000)as a regular tag try to look it up in the public dictionary and fail. - Wrong endianness assumption. If a tool reads a Little Endian file as Big Endian, every multi-byte value is byte-swapped and tags are unrecognizable.
Best Practices for Radiology IT
- Use Explicit VR Little Endian as the default transfer syntax for any export or migration that crosses vendor boundaries.
- Verify the 128-byte preamble + DICM on every file ingested. Reject or repair files missing the Part 10 header.
- When debugging, look at group 0002 first. Most encoding issues cascade from a malformed File Meta Information.
- Pad odd-length values correctly. Validate file conformance before transmission.
- Preserve length-field width for long-form VRs. Off-by-two errors here corrupt every following element.
- Track which transfer syntaxes are supported by each system in your environment. Plan re-encoding steps when source and destination differ.
Conclusion
DICOM’s on-disk encoding is well defined but unforgiving. Little-endian byte order, the implicit/explicit VR distinction, the special role of group 0002, and the careful structure of length fields and item delimiters are all worth internalizing for any radiology IT engineer who works with imaging data at the byte level. Once you can read a DICOM file in a hex editor, you can debug almost any imaging workflow problem with confidence. Continue your reading with our companion articles on DICOM private tags, DICOM VR types, and the broader tag field reference.