Share |

A Practical Guide To Implementing ONIX - part 2

So What Really is an Onix Message?

Put simply an ONIX message is a text file that is formatted in a very specific agreed way so that anyone who receives the file will know exactly how to interpret the contents. More formally, the ONIX message is a set of data elements defined by ‘tags’ written in XML.

Whoa, backup. XML? Tags?

XML (eXtensible Markup Language) is a text based markup language for structuring information. Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays. For example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc. Almost all documents have some structure and XML is a standard that allows the document/data structure to be absolutely defined.

If you’re not 100% clear about what XML is or what it does then peruse a few of the definitions that the Google search engine brings up http://www.google.com/search?num=100&q=define:XML.

In XML, “tags” or “elements” (people tend to use these terms interchangeably) are used to delimit the beginning and end of a piece of data and these tags can be grouped and structured to meaningfully represent data.

For example the following address:

22 Arcacia Avenue
Little Hampton
Dorset
D45 9JK

could be represented in XML as follows:

<Address>
<AddressLine1>22 Arcacia Avenue</AddressLine1>
<AddressLine2>Little Hampton</AddressLine3>
<AddressLine3>Dorset</AddressLine3>
<PostCode>D45 9JK</PostCode>
</Address>

The tags are the named elements in the angled brackets e.g. <Address></Address> with the closing tag always starting with a </. Anything between the tags can then either be content, as in the text “22 Arcacia Avenue” between the <Address1></Address1> tags, or another tag.

You will note that the AddressLine1, AddressLine2, AddressLine3 and PostCode tags are all encapsulated within the AddressLine tag fields in the sample above. By putting tags within tags complex data structures can be built up. The structure that has been built up with the XML tags for the address example is commonly called a “composite”.

In addition to storing content between elements it is also possible to store information within the tag itself in the form of “attributes”. For instance, in the example below a textformat attribute has been added and has been set to a value of 01. Attributes are useful for adding ancillary information about tag content.

<Address1 textformat=01>22 Arcacia Avenue</Address1>

Attributes are not used heavily in the ONIX messaging standard but they do appear.

If you have never seen an ONIX message before refer to the ‘A Sample Onix Message’ section where you can see how XML is used to structure title information using tags and composites.

In and of itself the ability for XML to structure and describe content in this way is useful but the real power lies in the fact that the structure of an XML document is defined by something known as a Document Type Definition (DTD). We don’t need to know the technical details of how a DTD works only that it is itself a text document that specifies rules on how an XML document should be structured and in particular how data elements are ordered and are interrelated. An XML document can be explicitly checked or “parsed” against a DTD and if its structure is somehow different from what the DTD expects an error message is produced and the XML document is considered “invalid”.

If two parties have access to a DTD they know all the rules for an XML document (that conforms to that DTD) and so one party can pass information to the other knowing that if they use the same DTD they will unambiguously be able to interpret the XML content that is passed e.g. There will always be three address lines followed by a postcode, etc.

And that is exactly what happens with ONIX messaging, a DTD is maintained by EDItEUR, that is made available to all publishers, distributors, trade partners and interested parties who use it to unambiguously pass bibliographic information between themselves.

The DTD effectively enshrines the ONIX standard in as far as it dictates what elements must be included and in what order.
By breaking down all of the information about a title into unambiguous tags of data and bundling them into an ONIX message information flow between computer systems can be more easily automated which theoretically makes the process cheaper and more efficient.

For example:

Consider a simple book title
“A history of Dungarvan - town and district”

The above can actually be split down so that computer systems can better sort and format it:

Title prefix = “A”
Title without prefix = “history of Dungarvan”
Subtitle = “town and history”
Title text case = “Sentence Case“

In an ONIX message the book title information looks like this:

<Title>
<TitleType>01</TitleType>
<TitleTextCase>01</TitleTextCase>
<TitleText>A history of Dungarvan</TitleText>
<TitlePrefix>A</TitlePrefix>
<TitleWithoutPrefix>history of Dungarvan</TitleWithoutPrefix>
<Subtitle>town and district</Subtitle>
</Title>

Since the title name is now split into constituent components the recipient can more easily manipulate the information. Maybe they could use the TitleWithoutPrefix tag content to sort titles more sensibly, whereas the TitleText element may be more appropriate for display on a web site or in an advertisement. It sounds obvious but before ONIX came along everybody was passing information around as big lumps of text, in different formats, so you either displayed a title as it came (if you could find it in the file), you re-keyed the information or you displayed nothing at all.

The down side to splitting out title information is that there is lots of it. We generated 8 lines of XML just to store a title name. Now imagine doing the same thing for all of your contributor, supplier, pricing, rights and categorization information for each of your titles and you’ll start to appreciate the scale of the problem. There can be literally hundreds of XML tags to describe one title! Indeed the sample ONIX message in the ‘A Sample Onix Message’ section is for only one title and could be considered moderately rich in terms of the information being sent and probably covers about 60% of the possible ONIX tags available.

By now you are probably realizing that it is simply not practical to do ONIX messaging without some software that can properly structure and centralise your title information and ideally automatically generate XML.

Previous Page     Next Page