Why Validate Before You Send
Most HL7 v2.x interfaces in the wild are technically non-conformant in some way: a missing trailing field, a non-standard ack code, a date in YYYY-MM-DD instead of YYYYMMDD. Many of these survive in production for years because the receiving system happens to be tolerant. But the day you change vendor, switch interface engines, or migrate to a stricter platform like a HAPI-based validator, those latent bugs surface as failed message processing. Pre-flight validation lets you find and fix them on your schedule, not the integration go-live calendar.
The Three Layers of HL7 v2.x Validation
Layer 1: Encoding and structure. The first character after MSH establishes the field separator (almost always |). The next four characters establish the encoding characters: component (^), repetition (~), escape (\), and subcomponent (&). A validator must read these dynamically and reject messages that announce one set of encoding characters in MSH-2 but use different ones throughout the body.
Layer 2: Required segments and fields. Each message type (the value in MSH-9, e.g. ADT^A01) has a defined structure: which segments must appear, which are optional, and which can repeat. An ADT^A01 requires MSH, EVN, PID, and PV1; missing PV1 is an error. An ORU^R01 requires MSH, PID, OBR, and at least one OBX. The validator walks the message against the message-type structure and flags missing required segments.
Layer 3: Datatypes and code tables. Even when the structure is correct, individual fields may carry invalid values. PID.8 (administrative sex) must be one of M, F, O, U, A, N. PV1.2 (patient class) must be one of E, I, O, P, R, B, C, N, U. MSA.1 (acknowledgment code) must be one of AA, AE, AR, CA, CE, CR. Datetime fields like MSH-7 must conform to the HL7 DTM datatype (YYYYMMDDHHMMSS[.S[S[S[S]]]][+/-ZZZZ]). A validator that catches these surface-level violations prevents downstream parsers from rejecting the message.
Errors vs Warnings: How This Validator Triages
An error means the message will fail at most receivers — missing MSH, missing required segment, malformed encoding. A warning means the message is accepted by tolerant receivers but rejected by strict ones — non-standard code values, weakly typed dates, missing optional but commonly required fields. An info-level finding flags patterns that are spec-compliant but suspicious in real traffic — for example, a PID without PID.3 (patient identifier list), which is technically optional in some message types but almost always indicates an upstream MPI issue.
Why MSH-2 Mismatches Are So Common
MSH-2 declares the encoding characters used in the rest of the message. The default value is ^~\&. A common bug pattern is a sender that hard-codes the standard encoding string in MSH-2 but actually emits a different escape character in field bodies (e.g., uses % instead of \). Receivers that read MSH-2 dynamically (the correct behavior) then misinterpret escape sequences or leave them un-resolved. The validator detects this by checking that escape sequences appearing in field values use the character declared in MSH-2.
Required-Segment Rules by Message Type
This validator enforces required segments for the most common message types:
- ADT (admission, discharge, transfer): MSH, EVN, PID, PV1
- ORM / OMG / OMI / OML / OMP / OMS / OMD (orders): MSH, PID, ORC, OBR
- ORU / ORF / ORG / ORI / ORL / ORN / ORP / ORR / ORS / ORU (results): MSH, PID, OBR, OBX
- SIU (scheduling): MSH, SCH, PID, AIS or AIG or AIL or AIP
- MDM (medical document management): MSH, EVN, PID, TXA
- ACK (acknowledgment): MSH, MSA
- QRY / RSP (queries and responses): MSH, QPD or QAK
- RDE (pharmacy/treatment encoded order): MSH, PID, ORC, RXE
- VXU (vaccination update): MSH, PID, RXA
- DFT (detailed financial transaction): MSH, EVN, PID, FT1
Datatype Validation: DTM, DA, TM, NM
HL7 v2.x defines several primitive datatypes whose format you can validate without consulting a separate registry. DTM (date/time): YYYYMMDDHHMMSS with optional fractional seconds and timezone offset. DA (date): YYYYMMDD. TM (time): HHMMSS. NM (numeric): optional sign followed by digits and an optional decimal portion. The validator runs format checks against these datatypes whenever the schema dictates a typed value (e.g., MSH-7 is DTM, OBX-5 is variable but checked against the OBX-2 value-type code).
Code-Table Validation
The validator includes the most operationally important code tables: HL7 0001 (sex), HL7 0004 (patient class), HL7 0008 (acknowledgment code), HL7 0103 (processing ID), HL7 0119 (order control), and HL7 0125 (value type). When a field whose datatype is constrained to one of these tables carries a value that is not on the list, the validator flags a warning rather than an error — many systems intentionally extend these tables with site-specific values, so the cleanest action is to flag and let the analyst decide.
Practical Workflow: Validate, Fix, Re-Send
A typical use of the validator looks like this. An interface analyst receives a complaint that an ADT feed is being rejected by the downstream EHR. They paste the failing message into the validator and immediately see two errors: missing PV1 and a malformed MSH-7 (2024-03-15T09:30 instead of 20240315093000). They flag the upstream sending application, attach the validation report to the ticket, and the upstream team knows exactly what to fix. The cycle that used to require pulling logs from the interface engine, eyeballing pipe-delimited text, and arguing with a vendor support contact collapses to under a minute.
What This Validator Does Not Do
This validator does not perform full conformance against an external profile (e.g., a vendor's customized message specification or a country-specific implementation guide). It does not check business-logic rules like 'OBR-7 must be on or after OBR-25' or 'PV1.44 must be set when PV1.2 = I'. It does not validate Z-segments (custom segments outside the HL7 standard). For those checks, you need a profile-aware validator like NIST's MWB or a HAPI-based custom validator. The role of this tool is to catch the 80% of issues that are independent of profile and would block any downstream processor.
From Validation to Resolution
Once the validator surfaces an issue, the next step is fixing the upstream system that produces the message. For required-segment errors, work with the sending application to add the missing segment or, if the segment is genuinely optional in your version of the spec, document the deviation and configure the receiver to accept it. For datatype errors, fix the source formatter. For code-table warnings, either align the value to the standard table or extend the receiver's allowed values explicitly. The validator gives you the precise rule citation needed to drive that conversation.