Our devices are getting smarter, and developers are finding interesting ways to improve the effectiveness of applications that make our lives easier. For example, we can talk to Siri, Cortana, Alexa, or a range of other personal assistant apps, speaking a request or asking a variety of questions, and receive useful or perhaps intentionally funny responses. We also receive unintentionally funny responses or responses that are not what we are looking for, such as when Siri responded to a request for help with a gambling problem by providing a list of casinos. We are, after all, in the early stages of artificial intelligence. Currently, much of what our computers know still has to be defined by us humans.
How we provide much of that intelligence is with markup. The richer our markup, the more control we have over our content. To illustrate, we can go beyond using markup to distinguish between paragraphs, lists, and other document constructs to defining multiple types of lists that might contain different categories of information or need specific formatting (author lists, glossary lists, and so on). With that understanding, schemas have bloomed. DITA is the best known example; the latest DITA specification defines over 600 elements. However, OASIS, the organization responsible for developing DITA, is not the only group to maintain a policy of inclusion. We’ve seen a number of proprietary schemas developed in this fashion—XML architects often err on the side of caution, creating elements that are used rarely or where they might not be needed yet but may one day be useful.
In addition to creating schemas with a lot of vocabulary, XML architects have also been known to create schemas with fairly complex grammar. Sometimes the structure of the XML document relies heavily on nesting of elements. Some schemas will have a wide variety of required and optional elements and attributes. Furthermore, schema design sometimes mixes use of elements and attributes such that inconsistencies cause confusion.
The problem is that, for now, we rely on people to add XML markup to documents. Traditionally, XML authoring systems have been powerful yet difficult to use. Either the system requires users to be experienced and well trained or the authoring tool needs to have been highly customized. SyncroSoft’s <oXygen/> XML editor, for example, can be used with any XML schema you choose to define and is fairly easy for anyone with development experience to customize, but the out-of-the-box experience for the end user is not at all intuitive for someone new to XML. It’s easy for the new user to insert an element in an invalid position, and though <oXygen/> prompts the user with warnings and explanations, it can still be a frustrating experience for someone who doesn’t have the context for interpreting those warnings.
For these reasons, many organizations have resisted implementing XML-based workflows for their business-critical content. As a result, we’re in the midst of a counter-revolution in document XML intended to simplify XML. We can see evidence of this counter-revolution in schema movements, such as that of Lightweight DITA. Such proposals not only trim down the number of elements to a bare minimum but also remove much of the nesting and other structural complexities. We also see the trend to simplify document XML in tools like Simply XML and Quark XML Author, two XML authoring systems that allow users to continue authoring in the familiar environment of Microsoft Word. Quark Author, a web-based XML authoring system, uses both a lean XML schema called Smart Content on the back end and an easy-to-use interface on the front end to take XML simplicity to the next level.
The challenge, of course, is not to oversimplify or the markup loses the power that XML was intended for altogether. For instance, CommonMark, a type of MarkDown, was dreamed up as an alternative language to XML/HTML to allow users to type in a text editor and still get some special rendering, but if you need anything more complex than emphases or lists, you literally have to type out the required HTML markup into the editor. CommonMark, then, has a very limited application.
Life will continue to get easier for content creators as our tools get smarter, but while humans are responsible for the richness of intelligent content, a balance needs to be struck between simple and powerful. Go too simple, such as with CommonMark, and your authors lose the ability to add much context to the words written. On the opposite end of the spectrum, a schema too descriptive becomes difficult for authors to use effectively. The sweet spot is XML nirvana.
About Autumn Cuellar
Autumn Cuellar has had a long and happy history with XML. As a researcher at the University of Auckland in New Zealand, Autumn co-authored a metadata specification, explored the use of ontologies for advancing biological research, and developed CellML, an XML language for describing biological models. Since leaving the academic world, Autumn has been delighted to share her enthusiasm for XML in technical and enterprise applications. Previously at Design Science, her roles included MathML evangelism and working with standards bodies to provide guidance for inclusion of MathML in such standards as DITA and PDF/UA. Now at Quark Software, Autumn provides her XML expertise to organizations seeking to mask the XML for a better non-technical user experience.
Note: This article was originally published in the June 2017 issue of CIDM eNews.
It’s the Rise of the Machines: Artificial Intelligence is here and our world is getting better every day! Except, not yet, not really, and not without a lot of work. But, hey, that describes all deployments of automation from digital assistants to paint robots.
So…what will it take to make artificial intelligence (AI) a reality? In my whitepaper “The Critical Value of Smart Content for Artificial Intelligence and Machine Learning,” I investigate the state of AI today and highlight the one factor that will allow us to move beyond digital assistants to more powerful implementations of AI. In the new whitepaper I cover:
Lastly, I’ll define the ideal description of content that should feed an AI system. It’s known as Smart Content, and it will help us all achieve the highest return on investment when implementing Artificial Intelligence.
To learn more, download the whitepaper “The Critical Value of Smart Content for Artificial Intelligence and Machine Learning.”
About the Author
Dave White is Chief Technology Officer for Quark Software. An engineer by training with over two decades defining standards for content automation, White is on the forefront of technologies shaping the future of content. He works with customers and partners across industries to develop and implement transformative solutions for creating, managing, publishing and delivering business-critical content.
Regardless of your industry, the first step towards a successful content strategy is understanding your content. Although it can be a daunting process, the best way to truly understand your content is to conduct a content audit. A content audit will ultimately help you select the best solutions for improving how you create, manage, publish, and deliver business-critical information to customers, partners, and employees.
So how can you conduct a content audit to get the most value? We boiled down the process into eight critical steps.
Step 1: Define the Objectives and Scope
Before beginning a content audit, you need to define your business and end-user objectives as well as document your authoring and maintenance objectives. Where possible, your objectives should be defined in specific, measurable terms. For example, a possible business objective for a life sciences organization could be to reduce staff training time by five percent for the roll-out of new pharmaceutical products. A content consumer objective might be to reduce the time it takes for customer support teams to find relevant information to no more than seven seconds.
Step 2: Plan Surveys and Inventories
In the content audit, you are on a mission to address questions related to your content, such as:
To answer these questions you will need to create surveys and inventories for content teams within your organization to complete. In this step, you are creating the documentation that will help you learn how content teams are structured, including roles and responsibilities, content types created, and how final content is delivered.
Step 3: Perform Document and Infrastructure Inventory
In many companies, documents are stored in multiple places, which can be a source of inefficiency and result in errors. To understand where documents live, you will need to conduct a document inventory, which is a quantitative assessment of all content assets, including documents, graphics, charts, photographs, spreadsheets, etc. In this step, you will:
The information you gather here will help in step four.
Step 4: Document Inventory Information
Once you understand your content and where it’s stored, formalize your inventory with more detail about each content asset. You will need to collect and organize the following foundational information about each asset:
Step 5: Identify Guidelines and Standards
In this step, you will identify which guidelines and standards apply to your content. Here are some examples of possible guidelines and standards for document and content types:
Step 6: Conduct User and Author Surveys
Earlier, in step 2, you planned for user and author surveys. Now is the time to select users and authors for the survey. Be sure to:
When you survey your authors, include all individuals who are responsible for creating and updating documents that are included in the content audit. With the survey, you are looking for first-hand knowledge of why, how, and by whom content is created, approved, and stored.
Step 7: Perform the Document Audit
The document audit is a qualitative evaluation of the document set and requires the content surveys to be completed and responses analyzed. The purpose is to establish a baseline for the current state of the documents. You will find out:
If the documents you are auditing have multiple audiences, conduct separate audits for each audience.
Step 8: Perform Document Analysis
The document analysis follows the document audit and should consider all the results from the various surveys and direct observations. The document analysis determines the necessary next steps for implementing a content automation solution. Common issues highlighted during content audits include:
Step 9 and Beyond
Steps 1-8 will help you fully understand your content and highlight your best practices and areas for improvement. As a result of a content audit, many enterprise organizations confirm the urgency with which they must move away from ad-hoc content creation and management strategies to more formalized content automation processes.
Are you considering conducting a content audit but would like more information? Contact Quark or download the Beginner’s Guide to Content Automation and begin to learn how a content audit can enable you to start planning and designing your content strategy today!
Innovation in how you approach and overcome obstacles should have a lot to do with the path you want to travel. I enjoy hiking trails in the wilderness. Which trail I select will determine if my hike is a dud or remarkable, especially this time of year.
Hiking in the spring can be fraught with obstacles such as fallen timber and raging streams. I always anticipate some acceptable risk and am not surprised when trails require that I find a different way – a better way – to get back on the path to a remarkable experience.
Innovation in business is similar. A business that wants to be different – remarkable – won’t be on the same path as every other company. Innovative businesses are constantly looking for new paths that allow them to be more successful. Often times these paths include complex obstacles to overcome.
One area where innovative leaders are forging new paths is in content strategy. Multi-channel content delivery is a competitive requirement today, which has added a tremendous amount of complexity to creating and managing content. This is especially true for the already complex content that helps organizations sell or run their businesses.
Like a flood across the trail, content processes today are fraught with obstacles. Valuable information is trapped in silos that become difficult for others to use. Or, content becomes unwieldy in heavy structures that add even more complexity. Well-intentioned content standards developed to solve business problems a decade ago can be a source of friction for transforming businesses today. How big of a bridge do you need to cross the stream?
Few businesses build products and services to be like everyone else. And, customers are looking for outcomes that will deliver a competitive advantage. How does a business differentiate their intellectual assets in human capital and valuable content to deliver optimum results? Often overlooked, adoption of best practices in content processes across organizations or the entire enterprise can be keystone to new business growth.
Trying to overcome content and process obstacles in the same way as everyone else may get you to the other side. But, where do you go from there? What path are you on? What remarkable outcomes do you and your customers gain?
Reconsider your approach to content. Do you have to stay on the same path or can you choose a new, better direction? Build the right bridge on the right path that everyone in your organization can follow, not just a chosen few. Ensure it delivers value.
It’s time to examine best practices in business-critical content. With content automation businesses can differentiate by eliminating complexity in how they create, manage, publish and deliver content. Customers can realize the outcomes in products and services promised without heavy structures.
Be brilliant! Build a bridge and get over it.
Quark is proud to announce that Mark Lewis, Content Strategist on our Professional Services team, has been named a Society of Technical Communication (STC) Fellow. Nominated by STC Fellows and elected by the STC Board of Directors, the rank of Fellow is the highest honor bestowed by the Society upon members.
Becoming an STC Fellow is a lifelong journey of achievement and throughout his career Mark has made distinct contributions to the STC community. Just a few of Mark’s accomplishments:
The STC honored Mark for “[his] tireless devotion, enthusiastically pioneering in content strategy/marketing and DITA; for [his] knowledge in promoting the profession, and willingness to work with anyone in the field of technical communication.”
Of his new STC rank, Mark said, “I personally owe much of my career to the STC and all the growth opportunities offered to me as a member. In every role I’ve held with the STC my goals have been to learn, grow, and give back to the next generation of technical communicators.”
Congratulations Mark on this well-deserved honor!
A funny thing happened on the way to XML’s world domination of the dissemination of written, document-oriented content: the data exchange world hijacked XML’s value and kept it for many years. Now JSON has the attention of web developers for data transactions – is XML in the way?
Getting Our Definitions Straight
For data (as used here ‘data’ refers to relational, or otherwise highly structured, discreet information such as financial data), XML and JSON are two sides of the same data description coin: either can be called and the game will be played. JSON works best for web-only developers, but learning XML isn’t too hard and the supporting resources are widely available with many available free and open-source.
For documents (as used here ‘documents’ means a mix of authored prose, multimedia, and data meant for presentation to a content consumer), XML is still the dominate open-standard format for semantically-rich content automation applications such as Quark Enterprise Solutions and modern word processing tools such as the Microsoft Office suite – though the purpose, use, and value of XML is significantly different between these document-focused solutions.
XML became an official W3C recommendation in February of 1998. At my previous company, two team members worked on the XML standard for several years alongside a who’s who of document and hyper-text technologists. The whole idea of XML, as driven by Jon Bosak, then at Sun Microsystems, was to take the benefits of SGML (Standard Generalized Markup Languages) and apply them to this new thing called “The World Wide Web.”
I remember how excited we all were when the spec was finally approved. So much attention was now being paid to our corner of the high-tech universe and the idea of having semantic XML content on the web was, to us at least, so clearly valuable. But then, the data jocks overwhelmed us document kids like a high school basketball team coming on the court after the band warms up the crowd.
EDI (electronic data interchange) methods have been around since the early days of computing. By the time XML became a recommendation, the data world was already building a new EDI method that took advantage of the web’s HTTP for the transport of messages and data payloads with the data package built using XML syntax. This EDI method was called SOAP (simple object access protocol) and when released by Microsoft and others in 1999, it very quickly became the main hype of XML’s value. All of us document folks were left playing the sad trombone sound while we continued our efforts to make semantically rich content’s value accessible and available to all (and still do today!).
Of course, all was not perfect for XML as an EDI solution. XML is a fairly verbose markup language and therefore the XML data payload can be multiple times larger than the data set it’s describing. And XML requires a robust parser, which has its own rules that were originally targeting document requirements, not the needs of more compact data structures. And lastly, many browsers were slow to adopt XML as a web standard.
An oversimplification to the answer of “What is the Right Tool” is something like this:
Of course there are still many systems that offer SOAP APIs. Further still, the more modern REST (representational state transfer) web API doesn’t really care about the payload format, so many systems may provide both XML and JSON responses (as does Quark Publishing Platform – developer’s choice). But there are definitely gray areas when trying to determine if XML or JSON is the best fit.
Several standards exist that are used for transacting files and metadata between parties including:
What these standards share is the use of XML to describe a package of documents in a way that lets the receiver of the package automate the handling of that package. For RIXML and eCTD, the payload mostly consists of PDF documents. The XML is used to hold the metadata that describes the package (producer, purpose, date, a description of each attached file, etc.). For the metadata “driver” or “backbone” file, XML made sense for many reasons, not the least of which was the contributors developing these standards were XML-knowledgeable folks and the tools and methodologies for creating these standards as XML were widely available.
Of course 27 characters isn’t particularly meaningful, but multiply the size of those messages by 10, 100, or 1000 and the size difference becomes meaningful. Yegor concludes that JSON is great for data sent to dynamic web pages, but he recommends XML for all other purposes.
However, his arguments against JSON were already being addressed (as he admits toward the end of the article) as the JSON world brought more tools to the party such as JSONPath and JSON rules files with validating JSON parsers. JSON features are now reasonably on par with XML, though of course still focused on solving the challenges of transacting data.
If you are technically minded and curious, it might be worth reviewing the RIXML Data Dictionary and jump to page 21 where the data dictionary begins in earnest. It takes a little over 100 pages to document the entire data semantics structure of the main areas of concern (not including the “sidecars” as RIXML calls them). This results in a metadata file describing the payload for a transaction of what is typically one or more PDF documents.
There is no reason why that structure couldn’t be represented as JSON, but there’s also not a particularly good reason to do so either. Ultimately what matters is which system receives the RIXML and document payload. In the case of most RIXML processing systems it is likely a backend server using Java or .NET code to parse the RIXML file and then update a database and file system according to agreed-upon business rules.
For example, take a distributor of financial research information that is produced and sent to the distributor by multiple different banks (the reason RIXML was created to begin with!). They receive the RIXML package, process it, store the information in their database or content management system and then present some portion of that information on a web page for subscribers to access. They don’t present the entire RIXML metadata – most of that would be useless to the research consumer. And a RIXML package isn’t really dynamic either – for a particular package, the metadata doesn’t change very frequently, if ever.
The distributor’s system isn’t going to rely on creating a subset of the original RIXML file to send to the browser. No, they’re going to query the system, because it is their single source of truth for content that is available. Delivering query results from the system as JSON to the browser is easier than creating or re-parsing or modifying the original RIXML file.
So an argument could be made to support both XML and JSON in RIXML (and by extension other metadata standards). Unfortunately for the JSON-only audience, the expense to recast those 100 pages of specifications for XML as JSON is non-zero, and a one-time conversion of the XML to JSON is not developer friendly. And for all of this additional effort the benefit would only apply to those that have yet to learn XML.
There is and will always be waves of new technology that provide an alternative to, overlap with, completely replace, or partially supplant an existing technology. At the time of XML’s development, its use for transacting data was secondary to its original purpose and shows just how hungry the data world was for a better defined standard for transactions. That a more purpose-built, data-friendly format, JSON, was created at close to the same time also highlights how much need there was for improvement and standardization in data transactions.
However, XML is still a fantastic technology for handling documents, metadata, and data, and especially adept at merging all three into a common structure that can be utilized by software for automation and by humans for authoring and consuming. If you are not exclusively processing discreet data transactions, there is a lot of benefit to understanding and utilizing XML and the rich toolsets that are available.
If you are purely a web-data jockey, it would still benefit you to learn XML and the associated tools because: a) you’re likely to run into a system that provides only XML; and b) having some XML skills would extend your opportunities to cool things that can be done in the document content domain.
Earlier this month, the announcement that Siri’s creators had successfully placed an order for pizza with voice commands made a splash in the technology community because the pizza ordering process is a fairly complex process for dumb machines. The success of the pizza order is one realization of the Semantic Web envisioned by Tim Berners-Lee, a vision that sees computers intelligently communicating with each other to automate complicated tasks. If the voice-commanded pizza order is any indication, the future is bright for increased productivity through digital assistants.
Much of the promise of the Semantic Web is built on a platform of metadata, which is used to identify data in a machine-readable fashion. For example, metadata can be used to differentiate between types of doctors so that someone asking their digital assistant for a nearby doctor isn’t directed to a doctor of veterinary medicine.
Metadata can be useful wherever a bit of machine intelligence is needed, including in an organization’s business-critical content. Take, for instance, online policy documents that are peppered with important terms defined in a glossary: it might be useful for key terms to be automatically linked to the definition of those terms, or a window with the definition to appear when your audience hovers over the term. For another example, financial reports may need public companies automatically linked to stock data.
In the above examples, glossary terms and public companies need to be marked as such for the system to identify where linking or hover behavior needs to be added. The problem is that leaving the marking responsibility to your subject matter experts (SMEs) is an undue burden. First, manual marking of metadata is an inefficiency that wastes your SMEs’ time. Second, the potential exists for metadata-appropriate content to be overlooked, which means that your audience may not receive the full benefit of your metadata-enriched processes.
Automated marking of metadata is one way in which Quark’s content automation expertise can help organizations enrich content processes — documents can be scanned for glossary terms, public companies, or other such important words or phrases and have the necessary metadata inserted at key points automatically using Quark software. Organizations can place the power of the automated metadata marking into their SMEs’ hands so that they can see where the metadata is being added and supplement or remove the marking as needed. Alternatively, the metadata can be added to content during the publishing process without any human intervention. Either way, your published content will be all the more rich for the intelligence added through metadata.