It’s time to reinvent
your world of content 

 

XML vs JSON

| August 29, 2016 | Dave White

JSON_blogA funny thing happened on the way to XML’s world domination of the dissemination of written, document-oriented content: the data exchange world hijacked XML’s value and kept it for many years. Now JSON has the attention of web developers for data transactions – is XML in the way?

Getting Our Definitions Straight
For data (as used here ‘data’ refers to relational, or otherwise highly structured, discreet information such as financial data), XML and JSON are two sides of the same data description coin: either can be called and the game will be played. JSON works best for web-only developers, but learning XML isn’t too hard and the supporting resources are widely available with many available free and open-source.

For documents (as used here ‘documents’ means a mix of authored prose, multimedia, and data meant for presentation to a content consumer), XML is still the dominate open-standard format for semantically-rich content automation applications such as Quark Enterprise Solutions and modern word processing tools such as the Microsoft Office suite – though the purpose, use, and value of XML is significantly different between these document-focused solutions.

History Lessons for Your Markup Language of the Day

XML became an official W3C recommendation in February of 1998. At my previous company, two team members worked on the XML standard for several years alongside a who’s who of document and hyper-text technologists. The whole idea of XML, as driven by Jon Bosak, then at Sun Microsystems, was to take the benefits of SGML (Standard Generalized Markup Languages) and apply them to this new thing called “The World Wide Web.”

I remember how excited we all were when the spec was finally approved. So much attention was now being paid to our corner of the high-tech universe and the idea of having semantic XML content on the web was, to us at least, so clearly valuable. But then, the data jocks overwhelmed us document kids like a high school basketball team coming on the court after the band warms up the crowd.

EDI (electronic data interchange) methods have been around since the early days of computing. By the time XML became a recommendation, the data world was already building a new EDI method that took advantage of the web’s HTTP for the transport of messages and data payloads with the data package built using XML syntax. This EDI method was called SOAP (simple object access protocol) and when released by Microsoft and others in 1999, it very quickly became the main hype of XML’s value. All of us document folks were left playing the sad trombone sound while we continued our efforts to make semantically rich content’s value accessible and available to all (and still do today!).

Of course, all was not perfect for XML as an EDI solution. XML is a fairly verbose markup language and therefore the XML data payload can be multiple times larger than the data set it’s describing. And XML requires a robust parser, which has its own rules that were originally targeting document requirements, not the needs of more compact data structures. And lastly, many browsers were slow to adopt XML as a web standard.

Just a few years later, in 2002, the JSON.org site was launched. JSON (JavaScript Object Notation) was a new data encapsulation language that, as the name suggests, was easier to process using JavaScript in a web browser than XML. While XML was verbose and required a complex, validating parser, JSON was simpler and purpose-built for data and the requirements for data processing code. On JSON.org there is a page that describes JSON as XML’s “Fat Free Alternative.” I assume this page was written around 2002 and it makes all the correct points, ultimately summarized as “XML is document-oriented. JSON is data-oriented.” And that is definitely true…by design!

JSON really came into its own when smartphones such as the 1st or 2nd generation iPhone provided a robust web browsing experience in 2007-2008. XML technology wasn’t included in those browsers, but JavaScript was, so JSON was a natural fit if you wanted to build robust web application experiences that worked on smartphones. Since then, JSON has overwhelmed XML for data transactions between a server and a web application, and often desktop and mobile native applications too!

It’s not XML vs. JSON, It’s Selecting the Right Tool for the Job

An oversimplification to the answer of “What is the Right Tool” is something like this:

Of course there are still many systems that offer SOAP APIs. Further still, the more modern REST (representational state transfer) web API doesn’t really care about the payload format, so many systems may provide both XML and JSON responses (as does Quark Publishing Platform – developer’s choice). But there are definitely gray areas when trying to determine if XML or JSON is the best fit.

Several standards exist that are used for transacting files and metadata between parties including:

What these standards share is the use of XML to describe a package of documents in a way that lets the receiver of the package automate the handling of that package. For RIXML and eCTD, the payload mostly consists of PDF documents. The XML is used to hold the metadata that describes the package (producer, purpose, date, a description of each attached file, etc.). For the metadata “driver” or “backbone” file, XML made sense for many reasons, not the least of which was the contributors developing these standards were XML-knowledgeable folks and the tools and methodologies for creating these standards as XML were widely available.

Yegor Bugayenko has an excellent article from 2015 regarding JSON and XML which starts with a snippet of JSON compared to the same in XML:

4448_1.0_in body JSON

Of course 27 characters isn’t particularly meaningful, but multiply the size of those messages by 10, 100, or 1000 and the size difference becomes meaningful. Yegor concludes that JSON is great for data sent to dynamic web pages, but he recommends XML for all other purposes.

However, his arguments against JSON were already being addressed (as he admits toward the end of the article) as the JSON world brought more tools to the party such as JSONPath and JSON rules files with validating JSON parsers. JSON features are now reasonably on par with XML, though of course still focused on solving the challenges of transacting data.

Finally, the “Right Tool” also has to include consideration of who is selecting the tool. A young web developer who has only ever used JavaScript with JSON will find XML unwieldy and his productivity will take a major hit if he has to learn how to process XML. An author who writes technical documentation in XML would look at a long, deeply nested technical document in JSON as almost impossible to use. Plus there are document tools that use JSON as their primary storage format. For the unwieldy world of prose, raw XML is a little friendlier to the viewer, especially if stored with proper whitespace applied (though I may have experience bias). Of course, no end users should be required to look at the raw file formats of either – that should strictly be developer territory.

A Little More about RIXML – A Good Test Case

If you are technically minded and curious, it might be worth reviewing the RIXML Data Dictionary and jump to page 21 where the data dictionary begins in earnest. It takes a little over 100 pages to document the entire data semantics structure of the main areas of concern (not including the “sidecars” as RIXML calls them). This results in a metadata file describing the payload for a transaction of what is typically one or more PDF documents.

There is no reason why that structure couldn’t be represented as JSON, but there’s also not a particularly good reason to do so either. Ultimately what matters is which system receives the RIXML and document payload. In the case of most RIXML processing systems it is likely a backend server using Java or .NET code to parse the RIXML file and then update a database and file system according to agreed-upon business rules.

Is there a case for parsing the RIXML in a web browser using JavaScript? Possibly, but I’ll argue that the browser doesn’t want the entire RIXML file, it only wants small portions of it.

For example, take a distributor of financial research information that is produced and sent to the distributor by multiple different banks (the reason RIXML was created to begin with!). They receive the RIXML package, process it, store the information in their database or content management system and then present some portion of that information on a web page for subscribers to access. They don’t present the entire RIXML metadata – most of that would be useless to the research consumer. And a RIXML package isn’t really dynamic either – for a particular package, the metadata doesn’t change very frequently, if ever.

The distributor’s system isn’t going to rely on creating a subset of the original RIXML file to send to the browser. No, they’re going to query the system, because it is their single source of truth for content that is available. Delivering query results from the system as JSON to the browser is easier than creating or re-parsing or modifying the original RIXML file.

However, JSON isn’t only friendly to JavaScript and browsers. It’s becoming more popular for many different data transactions including server to server. Modern web developers tend to learn both browser-side JavaScript and one or more server-side programming languages such as PHP, Java, or .NET. These developers are proficient with JSON and may have limited exposure to XML for data. Therefore, when asked to build a new RIXML processing system these web developers will be more productive if the RIXML is available as JSON, and they may even be happier to avoid learning XML.

So an argument could be made to support both XML and JSON in RIXML (and by extension other metadata standards). Unfortunately for the JSON-only audience, the expense to recast those 100 pages of specifications for XML as JSON is non-zero, and a one-time conversion of the XML to JSON is not developer friendly. And for all of this additional effort the benefit would only apply to those that have yet to learn XML.

Long Story Bygones

There is and will always be waves of new technology that provide an alternative to, overlap with, completely replace, or partially supplant an existing technology. At the time of XML’s development, its use for transacting data was secondary to its original purpose and shows just how hungry the data world was for a better defined standard for transactions. That a more purpose-built, data-friendly format, JSON, was created at close to the same time also highlights how much need there was for improvement and standardization in data transactions.

However, XML is still a fantastic technology for handling documents, metadata, and data, and especially adept at merging all three into a common structure that can be utilized by software for automation and by humans for authoring and consuming. If you are not exclusively processing discreet data transactions, there is a lot of benefit to understanding and utilizing XML and the rich toolsets that are available.

If you are purely a web-data jockey, it would still benefit you to learn XML and the associated tools because: a) you’re likely to run into a system that provides only XML; and b) having some XML skills would extend your opportunities to cool things that can be done in the document content domain.

Topics: , ,
Previously
«
Next Up
»
 
Featured