A Practical Guide To Implementing ONIX - part 3
Rules of XMLAs we’ve already mentioned an ONIX message is an XML document that conforms to a DTD maintained by EDItEUR. However, XML has a few basic rules of its own that must be adhered to if an XML document is to be considered correct or “well formed” and if you can get your head around them you’ll find ONIX messages fairly straightforward to understand (though not necessarily straightforward to implement as we’ll find out later).
XML Rule 1Every opening tag or element must have an equivalent closing tag.
For example the following would be invalid as there is no closing Address1 tag
<Address>
<Address1>22 Arcacia Avenue
</Address>
XML Rule 2
Element names are case sensitive so <Tag1> and <tag1> are different. Note the content text between the tags can be whatever case you desire.
XML Rule 3
Certain characters are not allowed in XML content and have to be replaced or “escaped” if the XML is to be considered well formed. In particular the ‘&’ and ‘<’ characters must be converted if they appear between tags. The ‘&’ character should be replaced with the special code (or “escape sequence”) & and the ‘<’ should be replaced with the escape sequence <
Once an intended recipient loads the XML file and extracts the content between the tags they will convert the escape sequence characters back to their original character representations so that the text is displayed correctly.
In the following sample XML,
<tag1>Barnes & Noble</tag1>,
the & symbol should be replaced by & to look like this
<tag1>Barnes & Noble</tag1>.
Once the XML had been transmitted, the recipient would then extract the ‘Barnes & Noble’ text and replace the & with the & symbol so that it looked like this ‘Barnes & Noble’ again.
Note the < characters that make up the tag names do not have to be escaped, only the content between the tags.
Valid characters that can appear in an XML document include:
Space character
Capital letters: A – Z
Lower-case letters: a – z
Digits: 0 – 9
Punctuation and brackets: !"'(),-.:;?[]{}
Currency, arithmetic, computer and other symbols: #$%*+/=>\@_`|~
Any characters not included above such as foreign characters or scientific symbols may need to be “escaped” with appropriate escape sequences.
DTDs
So you know the basic rules for an XML document the only other thing you need to know about an XML document is how you link it to a DTD (which as we now know has all the rules that dictate how the XML elements should be structured).
The link is achieved by adding a special DOCTYPE tag at the top of the XML document
<!DOCTYPE ONIXMessage SYSTEM "http://www.editeur.org/onix/2.1/02/reference/onix-international.dtd">
The significant bit is “http://www.editeur.org/onix/2.1/02/reference/onix-international.dtd” as this is the path to the DTD, which in this case is stored on the EDItEUR web server.
The number 2.1 in the text actually refers to the version of ONIX to be used whilst the following 02 refers to the sub-version.
Sub-versions represent minor modifications that have occurred in the ONIX standard, between major releases (see the DTD declaration section of the XML Message documentation for a discussion about versions), such as tags names changing or minor structural changes to the XML structure.
You must bear in mind that the ONIX standard is constantly evolving and people from all over the world are contributing ideas and making suggestions that help fix problems they come across or improve the standard as they try and implement ONIX within their own organisations. A Yahoo forum has been setup by EDItEUR http://groups.yahoo.com/group/ONIX_IMPLEMENT/ which in their words:
“will act as the forum for asking questions about the interpretation of ONIX standards, raising any practical problems which need to be addressed in future releases, and participating in discussion aimed at finding the best solutions”
Any ideas or issues that come through the forum are discussed by national ONIX committees and if approved are included in the next release of the standard.
We would strongly recommend you sign up to the forum as it has a complete history of issues and discussions about the standard and the chances are that any technical questions you may have will have already been answered within the forum. You can also see the kinds of issues being discussed today which may become parts of the standard tomorrow, although EDItEUR have stated that they do not foresee any major revisions of the standard in the near future (more tweaking around the edges)
When a release comes out a new version of the ONIX DTD is created (with a new sub-version number) to ensure backward compatibility since not everybody will have made the necessary changes at the same time. EDItEUR keep all versions of the DTDs available on their web site.
One of the most contentious areas about the ONIX standard is this wide variation of versions and interpretations. As the standard has evolved the structure of ONIX messages has changed and at any given time different people are able to produce and receive different versions of an ONIX message. So unfortunately, instead of being able to produce a single ONIX message you can send to everyone you have to produce different versions for different recipients and the “standard” isn’t quite as standard as one might have hoped. Practically this means that you can’t simply create a one off project to “do ONIX” but rather must put flexible systems in place that will allow you to modify our ONIX message feeds in synch with the standards and your recipients requirements for data as they change over time.
