Why Convert HL7 v2.x to JSON?
HL7 v2.x is a pipe-delimited, positional text format designed in the late 1980s. Its structure is precise but opaque to modern tooling: most REST APIs, NoSQL stores, ETL pipelines, and analytics platforms expect JSON. Converting HL7 messages to JSON unlocks them for SQL queries, JavaScript processing, Python data analysis, and storage in document databases like MongoDB or Elasticsearch β without modifying downstream systems.
Anatomy of an HL7 v2.x Message
An HL7 v2.x message is a carriage-return-delimited sequence of segments. Each segment starts with a three-character name (MSH, PID, OBX). Fields within a segment are separated by the field separator character (usually |). Fields may contain components separated by ^, subcomponents separated by &, and repetitions separated by ~. The MSH segment's first two fields establish these encoding characters, so a compliant parser reads them dynamically rather than hard-coding |^~\&.
Simplified Mode: Fast, Flat, Readable
In simplified mode, the converter emits one JSON object per segment occurrence, with field values keyed as SEG.N (e.g., PID.5 for the patient name field). Single-value fields become strings; fields with components become arrays. This format is easy to index in any document store and straightforward to query with jq or Python's json module. It trades component-key verbosity for readability.
HAPI-Style Mode: Component-Named for System Integration
HAPI (HL7 Application Programming Interface) is the most widely used Java library for HL7 v2.x processing. When HAPI serializes a message to JSON, it names the components using the HL7 data-type descriptor β so rather than PID.5[0] you get {"FN":{"surname":"Smith"},"given":"John","middleInitialOrName":"A"}. Systems that emit or consume HAPI JSON (Epic Bridges, Rhapsody channels, Ensemble adapters) produce or expect this shape. If you are building a system that must match HAPI's output, selecting HAPI-style mode saves hours of re-implementation.
Field Repetitions and Occurrence Tracking
HL7 segments can repeat β an ORU^R01 result message may have dozens of OBX segments, each carrying a different lab measurement. The converter assigns an occurrence index to repeated segments so OBX[1], OBX[2], OBX[3] stay distinct in the JSON output. Field repetitions (multiple values within one field, separated by ~) are preserved as arrays rather than collapsed into the first value.
Escape Sequences and Unicode
HL7 v2.x uses a backslash escape mechanism to represent the five reserved encoding characters within field values: \F\ for the field separator (|), \S\ for the component separator (^), \R\ for the repetition separator (~), \T\ for the subcomponent separator (&), and \E\ for the escape character itself (\). When the unescape option is enabled, the converter resolves these to their literal characters in the JSON output. Keep it disabled when you need a round-trip-safe JSON representation that can be converted back to HL7 without data loss.
Null and Empty Fields
In HL7 v2.x, an empty field (||) means "not provided" and a double-quote field (|""|) means "explicitly delete the previous value". The converter maps empty fields to JSON null to distinguish them from absent fields β a difference that matters when using the JSON to perform database updates where a null should overwrite an existing value but absence should leave it unchanged.
Performance for Large Messages
HL7 v2.x messages are typically small β a few hundred bytes to a few kilobytes. Even a complex ORU^R01 with 50 OBX segments converts in under a millisecond in any modern browser. Batch files (multiple messages concatenated in a single file) are not yet supported by this tool, which expects a single message per conversion. For batch processing, split the file on MSH lines and convert each message separately.
Integration Use Cases
Common integration patterns that benefit from HL7-to-JSON conversion include: feeding HL7 demographics (PID, PV1) into a REST-based master patient index; populating Elasticsearch with lab results (OBX) for analytics; testing FHIR conversion logic by first parsing the source HL7 message to JSON then mapping JSON paths to FHIR resource fields; and logging messages in structured form to CloudWatch or Datadog where pipe-delimited text is not searchable.
Round-Trip Fidelity
For integration purposes, JSON is typically an intermediate form that does not need to reconstruct the original HL7 message. However, if round-trip fidelity is required β generating HL7 from JSON β you must preserve null fields (do not omit them), preserve repetitions as ordered arrays, and keep escape sequences encoded (disable the unescape option). Simplified mode is less suited for round-trips because it omits component names; HAPI-style mode preserves enough structure for reconstruction with a compliant library.
In production integration programs, HL7-to-JSON conversion is often the bridge between legacy feeds and modern observability or API layers. Teams use the JSON output to feed staging databases, compare pre-transform and post-transform payloads in automated tests, and create searchable structured logs for incident response. Because the output is generated locally, analysts can validate mappings with realistic messages before copying the resulting JSON into ETL jobs, transformation specs, or FHIR conversion notebooks.
The converter is also useful as a contract clarification tool when multiple teams disagree about what a field means. Instead of debating pipe counts in email threads, you can inspect the exact parsed structure, confirm how repetitions and components are being read, and document the agreed JSON path that downstream consumers should use. That reduces ambiguity in interface tickets and makes it easier to align HL7 analysts, backend engineers, and data teams around the same message model.