Why Convert HL7 v2.x to JSON?
HL7 v2.x is a pipe-delimited, positional text format designed in the late 1980s. Its structure is precise but opaque to modern tooling: most REST APIs, NoSQL stores, ETL pipelines, and analytics platforms expect JSON. Converting HL7 messages to JSON unlocks them for SQL queries, JavaScript processing, Python data analysis, and storage in document databases like MongoDB or Elasticsearch β without modifying downstream systems.
Anatomy of an HL7 v2.x Message
An HL7 v2.x message is a carriage-return-delimited sequence of segments. Each segment starts with a three-character name (MSH, PID, OBX). Fields within a segment are separated by the field separator character (usually |). Fields may contain components separated by ^, subcomponents separated by &, and repetitions separated by ~. The MSH segment's first two fields establish these encoding characters, so a compliant parser reads them dynamically rather than hard-coding |^~\&.
Simplified Mode: Fast, Flat, Readable
In simplified mode, the converter emits one JSON object per segment occurrence, with field values keyed as SEG.N (e.g., PID.5 for the patient name field). Single-value fields become strings; fields with components become arrays. This format is easy to index in any document store and straightforward to query with jq or Python's json module. It trades component-key verbosity for readability.
HAPI-Style Mode: Component-Named for System Integration
HAPI (HL7 Application Programming Interface) is the most widely used Java library for HL7 v2.x processing. When HAPI serializes a message to JSON, it names the components using the HL7 data-type descriptor β so rather than PID.5[0] you get {"FN":{"surname":"Smith"},"given":"John","middleInitialOrName":"A"}. Systems that emit or consume HAPI JSON (Epic Bridges, Rhapsody channels, Ensemble adapters) produce or expect this shape. If you are building a system that must match HAPI's output, selecting HAPI-style mode saves hours of re-implementation.
Field Repetitions and Occurrence Tracking
HL7 segments can repeat β an ORU^R01 result message may have dozens of OBX segments, each carrying a different lab measurement. The converter assigns an occurrence index to repeated segments so OBX[1], OBX[2], OBX[3] stay distinct in the JSON output. Field repetitions (multiple values within one field, separated by ~) are preserved as arrays rather than collapsed into the first value.
Escape Sequences and Unicode
HL7 v2.x uses a backslash escape mechanism to represent the five reserved encoding characters within field values: \F\ for the field separator (|), \S\ for the component separator (^), \R\ for the repetition separator (~), \T\ for the subcomponent separator (&), and \E\ for the escape character itself (\). When the unescape option is enabled, the converter resolves these to their literal characters in the JSON output. Keep it disabled when you need a round-trip-safe JSON representation that can be converted back to HL7 without data loss.
Null and Empty Fields
In HL7 v2.x, an empty field (||) means "not provided" and a double-quote field (|""|) means "explicitly delete the previous value". The converter maps empty fields to JSON null to distinguish them from absent fields β a difference that matters when using the JSON to perform database updates where a null should overwrite an existing value but absence should leave it unchanged.
Performance for Large Messages
HL7 v2.x messages are typically small β a few hundred bytes to a few kilobytes. Even a complex ORU^R01 with 50 OBX segments converts in under a millisecond in any modern browser. Batch files (multiple messages concatenated in a single file) are not yet supported by this tool, which expects a single message per conversion. For batch processing, split the file on MSH lines and convert each message separately.
Integration Use Cases
Common integration patterns that benefit from HL7-to-JSON conversion include: feeding HL7 demographics (PID, PV1) into a REST-based master patient index; populating Elasticsearch with lab results (OBX) for analytics; testing FHIR conversion logic by first parsing the source HL7 message to JSON then mapping JSON paths to FHIR resource fields; and logging messages in structured form to CloudWatch or Datadog where pipe-delimited text is not searchable.
Round-Trip Fidelity
For integration purposes, JSON is typically an intermediate form that does not need to reconstruct the original HL7 message. However, if round-trip fidelity is required β generating HL7 from JSON β you must preserve null fields (do not omit them), preserve repetitions as ordered arrays, and keep escape sequences encoded (disable the unescape option). Simplified mode is less suited for round-trips because it omits component names; HAPI-style mode preserves enough structure for reconstruction with a compliant library.