Why XSLT and XQuery Are Coming Back
by Kurt Cagle, Editor, The Cagle Report
Back around the turn of the Millennium, I was writing books on XML. The basic XML language had been standardized by 1998, and in short succession, you saw four related standards: XSLT for transformations, XPath for selection, XQuery for processing and XSD for schema definition. An XML version of HTML 4 called XHTML had been standardized as well, and browsers implemented the earliest XSLT standard (1.0) into their language infrastructure.
However, there was also push-back. The new Mozilla team was in general fairly hostile to XML, primarily because it competed with the idea of HTML as being a fully fledged language in its own right, and because of the basic belief that imperative programming (in this case Javascript) was superior to declarative approaches to coding. There have been many arguments over the years about why HTML was better than XHTML, much of it having to do with the ideas that XML was unfamiliar to developers, didn't mesh well with the requirements for streaming, XML-based tools were hard to use and in general XML was less efficient at encoding information than JSON.
None of these have really stood the test of time, but it hasn't really mattered - there have been very few successful implementations that have made much headway in the browser space, and with the advent of nodejs, it would seem like XML is now in the process of disappearing off the face of the planet.
However, while JSON has demonstrably become a de facto standard for interchange, more and more people are beginning to start question the benefits that JSON has brought to the table, and whether in fact there might be some benefit to XML and its "stack" after all. Most comparisons that Javascript developers, in particular, make to XML are based upon a toolset that is now twenty years old.
Suppose, for instance, that a modern toolset for working with XML were made available, one that was compliant with standards that have been improved upon in precisely the same way that JSON and Javascript have, utilizing many of the same tools and techniques: template literals, map/reduce operations on arrays, higher order functions, immutable programming, streaming architectures, and more. Suppose that you could use these tools in the browser, could use these tools in nodejs, even use it in python. Suppose that such tools could even work with JSON, allowing you to consume JSON and produce JSON through a templating architecture (or even embedded XQuery expanded with Javascript functions).
Recently, something wonderful happened. Michael Kay, the CEO of Saxon and the man who has quietly shepherded the XSLT standard since the late 1990s, released saxon-js, a version of the Saxon engine intended to work with JSON. This is not the first time that he's created a JSON based XSLT engine, but with saxon-js he did two things that I believe will bring an XML renaissance: first, he made the product free (it has a license, but that license controls the source code, not its use), and he made it capable of working both in the browser and within the popular nodeJS environment. Saxon can similarly be used in Java environments, which also makes it utilizable by Python and other languages through Java intermediaries.
Why is this news? Software is released every day after all. What makes this significant is that XSLT3 is remarkably effective at transforming documents of any sort - whether text, CSV, XML, JSON, RDF or any other kind of text. Most contemporary document formats such as Microsoft Word, PDFs, Google Docs or Liber Office, are actually stored as zipped packages of XML. What's more, content in your organization is now coming from dozens of different sources, but if you can transform any content into a metadata document (note I didn't necessarily say XML metadata document), then you can transform it with XSLT so that it's appropriate for mobile devices, for documentation services, for web content, even for embedding as metadata into images and other media.
What Saxon-js does is that it brings this transformative capability back into the Javascript ecosystem. Javascript has very limited transformative capabilities. The closest that you have today are template literals, and while such literals are useful (I'm a big believer of them in my own work) they have only a small proportion of the total flexibility that XSLT 3 does. If you manage multiple content and digital asset management systems, XSLT transformations will in general give you a veritable toolbox for handling everything from converting markdown into HTML to analyzing content for sentiment analysis keywords or entity extraction to extracting data from tables and even diagrams from scanned books.
XSLT3 is a language for traversing trees (or treesets), regardless of whether they are text, HTML, XML, JSON, CSVs or even RDF, and generating output in as many forms.
This becomes especially powerful with the ability to run such transformation in nodeJS and to pass classes that can access everything from machine learning services to SPARQL queries to data from a broad spectrum of sources. One common limitation that you've seen in earlier versions of XSLT is the ability to pass in headers and other metadata into document request or retrieval calls. However, with XSLT it's possible to pass in encapsulated functions that combine async and await calls in Javascript around the fetch() command, with XSLT then able to process the resulting document (even if it's JSON) and determine from contextual clues what the document content is.
Into this mix also comes the frequent commentary around XQuery 3 being better than XSLT. As of XSLT 3.0, however, that's a moot point because XPath 3 actually makes it possible to incorporate XQuery statements, including FLWR statements, into value or variable calls. This means that you can start with a simple XSLT framework that does little more than wrap an XQuery "script" and provide a context. You can also utilize the transform() function (not element) to build inline XSLT on the fly (or pass it in from an external resource).
XSLT 3.0 also incorporates two of the most heavily used objects in the JSON/Javascript arsenal: maps and arrays. Maps are immutable objects (corresponding to JSON entities), while arrays are similarly immutable.
Finally, it is possible to write transformations that actually facility dynamic changes within web browsers by writing templates that capture events and use the resulting event targets to make changes to a shadow dom in a manner similar to how React works. This becomes especially useful from a development standpoint since it means that you can create (or aggregate) a data model within the XSLT, which in this case is actually a running process, not simply a static transformation, and make access to that data at any level feasible.
One of the central problems that many companies that are in the data space run into is integration. XSLT was the foundation, many years ago, of Microsoft's BizTalk server (I know, I worked on it way back when). I feel that XSLT 3 today could very easily facilitate this same process with far more sophistication and across far more formats. I look forward to writing more about my explorations with Saxon-js in subsequent newsletter postings.
Connections
I've started bookmarking those posts that I think may be relevant to you, my gentle reader. As I ramp up, I'm going to share the highlights of these for further exploration. Linked In tip - when you see a post that you like, click on the three dots at the upper right of the post in the main page, then click on Save.
Once you have saved a page, you can get back a list of all of your saved content by going to the Linked In Feed Page (the home page). In the upper left hand corner, you'll see the identity box, and at the bottom of this box, you'll see Saved Items. Click on this to retrieve those items you've bookmarked. I find this invaluable for helping me not only retrieve articles that I wanted to review, but also for compiling lists, such as the ones below, of sites for others that I think may be of interest, amusing, informative, or thought provoking. A la cuisine!
Kurt Cagle is the editor of The Cagle report, and a longtime blogger and writer focused on the field of information and knowledge management.
The Cagle Report is a daily update of what's happening in the Digital Workplace. He lives in Issaquah, Washington with his wife, kids, and cat.
Manager at Parexel
4 年Thanks for the article. It is good that Saxonica created a library for JS and I hope it will become popular and will be developed in the future. But I'm concerned about one thing. Saxon-js has a custom public license, which from my point of view is not compatible with open-source licenses as by definition open-source license allows to modify a source code. So if you are writing a project which uses MIT/GPL v3/Apache 2.0, you formally cannot use that library. I think a lot of small projects will simply ignore it, but for those who try to be strict with their licences that can be a no go. In fact XML as a format is used a lot in JS. Libraries like xml2js and xmlbuilder have around 10 million weekly downloads. I agree with you that things like XSLT/XSD are not that popular. Just as a simple statistics from npmjs: 30k packages for keyword JSON, 3k packages for XML, 111 packages for XSLT, 70 packages for XSD. Let's see if Saxon-js will help to improve that situation. For a long time I am looking for a JS library which can validate an XML against a schema in JS. There are some solutions based on xmllib2 binaries transformed to JS, but they have certain disadvantages. As I understand Saxon-JS does not allow to do this.
Software Product Management and Services Professional
4 年Good article. Thank you fur the insights.
Sales Leadership: Better Business Thru Technology
4 年Very interesting, and I'd like to know more about the implications. "Quite a while ago" I was involved with some software built on SGML that pioneered the ability to separate "content instance from semantics from formatting". Large clients loved this and it was a big revelation. Then SGML evolved or something into XML. But then with XML the semantics-as-first-class-citizien got lost somehow (nicely described by one guru as "the return of hierarchical database -- a very bad thing -- even while ontology began its slow rise to possibility). Along the way XSLT also evolved, with promise again along these lines. The idea was that business would be able to store content in a relational database according to semantics ("shredded"), and then deploy on demand via XSLT, systematically in real time. I notice you mention "data model" above. Is there more that can be said about very positive business implications of your topic today? Thanks!
Ontological Matching, Entity Resolution and Data Integration Consulting
4 年Kurt Cagle I think there is an over presumption (if there is such a word) here that there are or were rational and technical reasons why these technologies fell into disuse in the first place. Today people jump through all manner of hoops to declare a technology schemaless or fashion a schema to make it queryable with SQL but simply encoding the data as XML instantly gives you your data in a schemaless queryable format. There has never been a rational reason for ever discarding a technology with such capability. I don't know how helpful it is not to acknowledge that.
Senior Director, Data Science Development at CDISC
4 年Dmitry Kolosov