Striking a Balance: Simplicity vs. Richness in XML Architecture

Our devices are getting smarter, and developers are finding interesting ways to improve the effectiveness of applications that make our lives easier. For example, we can talk to Siri, Cortana, Alexa, or a range of other personal assistant apps, speaking a request or asking a variety of questions, and receive useful or perhaps intentionally funny responses. We also receive unintentionally funny responses or responses that are not what we are looking for, such as when Siri responded to a request for help with a gambling problem by providing a list of casinos. We are, after all, in the early stages of artificial intelligence. Currently, much of what our computers know still has to be defined by us humans.

How we provide much of that intelligence is with markup. The richer our markup, the more control we have over our content. To illustrate, we can go beyond using markup to distinguish between paragraphs, lists, and other document constructs to defining multiple types of lists that might contain different categories of information or need specific formatting (author lists, glossary lists, and so on). With that understanding, schemas have bloomed. DITA is the best known example; the latest DITA specification defines over 600 elements. However, OASIS, the organization responsible for developing DITA, is not the only group to maintain a policy of inclusion. We’ve seen a number of proprietary schemas developed in this fashion—XML architects often err on the side of caution, creating elements that are used rarely or where they might not be needed yet but may one day be useful.

In addition to creating schemas with a lot of vocabulary, XML architects have also been known to create schemas with fairly complex grammar. Sometimes the structure of the XML document relies heavily on nesting of elements. Some schemas will have a wide variety of required and optional elements and attributes. Furthermore, schema design sometimes mixes use of elements and attributes such that inconsistencies cause confusion.

The problem is that, for now, we rely on people to add XML markup to documents. Traditionally, XML authoring systems have been powerful yet difficult to use. Either the system requires users to be experienced and well trained or the authoring tool needs to have been highly customized. SyncroSoft’s <oXygen/> XML editor, for example, can be used with any XML schema you choose to define and is fairly easy for anyone with development experience to customize, but the out-of-the-box experience for the end user is not at all intuitive for someone new to XML. It’s easy for the new user to insert an element in an invalid position, and though <oXygen/> prompts the user with warnings and explanations, it can still be a frustrating experience for someone who doesn’t have the context for interpreting those warnings.

For these reasons, many organizations have resisted implementing XML-based workflows for their business-critical content. As a result, we’re in the midst of a counter-revolution in document XML intended to simplify XML. We can see evidence of this counter-revolution in schema movements, such as that of Lightweight DITA. Such proposals not only trim down the number of elements to a bare minimum but also remove much of the nesting and other structural complexities. We also see the trend to simplify document XML in tools like Simply XML and Quark XML Author, two XML authoring systems that allow users to continue authoring in the familiar environment of Microsoft Word. Quark Author, a web-based XML authoring system, uses both a lean XML schema called Smart Content on the back end and an easy-to-use interface on the front end to take XML simplicity to the next level.

The challenge, of course, is not to oversimplify or the markup loses the power that XML was intended for altogether. For instance, CommonMark, a type of MarkDown, was dreamed up as an alternative language to XML/HTML to allow users to type in a text editor and still get some special rendering, but if you need anything more complex than emphases or lists, you literally have to type out the required HTML markup into the editor. CommonMark, then, has a very limited application.

Life will continue to get easier for content creators as our tools get smarter, but while humans are responsible for the richness of intelligent content, a balance needs to be struck between simple and powerful. Go too simple, such as with CommonMark, and your authors lose the ability to add much context to the words written. On the opposite end of the spectrum, a schema too descriptive becomes difficult for authors to use effectively. The sweet spot is XML nirvana.

About Autumn Cuellar
Autumn Cuellar has had a long and happy history with XML. As a researcher at the University of Auckland in New Zealand, Autumn co-authored a metadata specification, explored the use of ontologies for advancing biological research, and developed CellML, an XML language for describing biological models. Since leaving the academic world, Autumn has been delighted to share her enthusiasm for XML in technical and enterprise applications. Previously at Design Science, her roles included MathML evangelism and working with standards bodies to provide guidance for inclusion of MathML in such standards as DITA and PDF/UA. Now at Quark Software, Autumn provides her XML expertise to organizations seeking to mask the XML for a better non-technical user experience.

Note: This article was originally published in the June 2017 issue of CIDM eNews.