Ascii1000D: Lightweight Markup S1000D-style, part 1
In this piece we discuss the background of CCS (Component Content Systems, aka topic-based authoring) in regards to S1000D, and how we can replicate the business architecture of S1000D in a lightweight markup (Asciidoc) "Docs-As-Code" publications systems.
Follow along with the public-facing Ascii1000D sandbox: https://github.com/lopsotronic/Ascii1000D
Introduction: What is the Bare Functionality of Topic Based Authoring?
[New software licenses at ten grand per person per month]
+
[1 additional dedicated tools Level IV salaried headcount per five writers]
=
[saving an hour per person per day].
Topic-based content systems - when attempted by actual real writers with a rapidly changing new product - have an occasional tendency to mutate into money pits.
See above calculation for the best-case scenario.
Punch those numbers in your calculator and it'll make a frowny face. Add in that the end product - the PDF "book" - is, well, worse, no getting around that, and it's not long before management starts giving you stinkeye. Unless you truly are saving gargantuan amounts of money with topic based authoring.
There's lots of reasons for this: the huge range in the writer skillset; the woes of XML-based publishing, and something I call "The Applicability Trap", among others. The first one and the last one can (and will) bite you regardless of what tool/markup/vendor you're using - that's a story for later.
So, topic-based authoring - particularly conditional, applicability-driven content - is it doomed? No. There's good, quantifiable functionality in these Topic Based Specifications. Particularly - well, exclusively - for those organizations who share a lot of content between deliverables. So what are the bare necessities? Transclusion, Partial Transclusion, and Conditionals. That's it.
That functionality expands out into the following more granular categories:
So there's some of our useful functions from component content systems.
For the functions that are worthwhile, what's the lowest cost point at which we can make these happen, while still getting old-fashioned print output?
Time to Get Stupid
Right around now I'm going to say something that most content professionals would call pretty stupid. Maybe even monumentally, astonishingly stupid. So stick with me a second. Let's toss out our notions about strict validation, content information typing, and schema definition languages. No one wants to pay for those, anyway. It's time to get stupid.
So: lowest cost point at which you can do component authoring? Asciidoc.
We're keeping all the aero/def business architecture from S1000D (SNS, incodes, etc), but we're going to do the actual work with standard programmer tools: Text files, Atom, VSC (Visual Studio Code), git, and CLI (Command Line Interfaces or Shell interfaces like BASH or MS PowerShell). Modern text editors - VSC and Atom - are awesomely powerful compared to a typical dedicated XML editor. Asciidoc - DocBook interoperability lets us push to the XML world, if we need to, but we don't need to rebuild our entire business around namespaces, custom parsers, or mixed-mode XSD/DTD validations. HTML+Javascript gives you some extremely sophisticated IETM behavior all the way to L5, while you generate PDFs using a range of technologies, from the simple (the standard `asciidoctor-pdf` gem) to the complex (FOPUB, which integrates the DocBook-XSL PDF processor).
Anyway, we just want docs, so let's use a doc format. It's simple. It's stupid. Text files talking to text files.
Non-Apology Apology:: I'm not advocating we all start writing aerospace publications in lightweight markup languages. Aircraft are complicated things, and the modern sustainment pipeline makes them fifty times as complicated. Integrating lightweight markup with that, from scratch, is going to take work, and if you have a unified ERP/LSA environment then that work's already been done for you. Center your pubs operation around that instead. Having said that, if you're in a situation where 1) you have no money for?tooling, 2) you need to push content in a hurry, 3) you have non-integrated business information all over the place (PDM, LSA, CAD, ERP, LMS, etc), and 4) you have a requirement for multiple active contributors in a topic-based system, then maybe take a page from the software development world. Lightweight markup in text files, programmer-style text editors, and off the shelf version control systems will get the job done, and it's not a galaxy away from how we would do things in S1000D anyway. Alright! Editorial non-apology apology is over.
Let's review some useful Asciidoc equivalents of S1000D constructs. Today, let's take a look at Asciidoc equivalents of Publication Modules (PMs), Applicability, and Common Information Repositories (CIRs).
Simple Ascii1000D . . in Action
No ACT/PCT/CCT, no CIRs: in the below diagram, the Publication Module brings in Data Modules via the Asciidoc include directive. The Publication Module then sends the resulting package to a document processor, which creates PDF, HTML, and more.
Publication Module (PM)
PM is an easy one. A PM equivalent is an asciidoc file that includes other asciidoc files.
That's it.
We'll be seeing the include directive rather a lot when engineering S1000D equivalence. In programmer-y terms, Asciidoc `include` is an implementation of transclusion, not dissimilar to pmref or topicref or object or xinclude, but far simpler. In Asciidoc, we can have transclusion and we can also have partial transclusion, which we'll talk about later with CIRs.
To help make sense of include, let's take a look at our file system. The file system scheme for our Ascii1000D project might look something like this:
If you're from S1000D, a lot of those acronyms and numbers mean something, with the exception of ICF (illustration control file), which is a method for handling applicability (conditional content) for illustrations (ICFs are a way of moving applicability chunks for graphics away from the narrative). I am, unfortunately, assuming that you're bringing in a little S1000D knowledge to this article. If you aren't, merely marvel at the amazing length of these filenames.
Also, wow, that's a flat folder structure, isn't it? Why is that? It's all about relative paths. By default , the Asciidoc processor considers the including document to be the current location - regardless of where the included document might be. When you run the PMC through the processor, it thinks it's in ./PMC even as it processes the included files. That's why the `include` is called a pre-processor directive - it pulls the includes in before it starts transforming. We could hack this by using user-defined attributes or built-in attys like imagesdir in our included files, but for today, let's keep it nice and simple with a flat hierarchy of data modules. PMs and DMs are at the same level so we don't need to worry about relative paths changing from DM to PM. ./CIR is always the place to find Common Information Repositories, regardless of whether you are running from DMC or PMC or somewhere else. One step up, and one step over.
Let's crack open one of those PMs, viewing with Microsoft Visual Studio Code and the Asciidoctor Plugin activated[3].
The Publication Module starts with a level one heading - the PMTITLE of the deliverable - and those ::includes:: can be arranged in whatever nesting of headings (PMENTRIES) you might desire. At the top of the PM, we also have a whole bunch of book metadata Asciidoc carries over from Docbook, which is a good thing - we use all of it.
PMC ::includes:: need to use the leveloffset attribute in the include. Why? A data module will always have a level one heading - the DMTITLE. Included directly without leveloffset, DMTITLE will have an argument with the level one heading for the publication module ("Who's the title? I'M THE TITLE"). Leveloffset lets you tell the included file what heading it's supposed to start at. Leveloffset=+2 tells it to add two heading levels to whatever is being included.
If you're using DocBook-XSL (also present in FOPUB and the "boxed" Asciidoc editor AsciidocFX) at all, don't skip heading levels: if you go to heading 3 from heading 1 with a DocBook processor, it will complain mightily. It expects heading increments to be 1.
领英推荐
The PM also uses document header attributes to tell the processor "I Am a Book" as opposed to "I Am an Article" or "I am a Data Module". PM doc header also contains information like title, author, date, revremarks, all that stuff.
Finally, the publication module is where user-defined attributes are declared for applicability. Asciidoc applicability is very stripped down from the powerful S1000D applicability model, but, on the other hand, it does work out of the box, for free. The PM is just where the applicability is declared - the conditions are used in the data modules. You might see it declared in a PM as follows:
Those attributes are global, so they will be in effect for any and all `included` files that the PM brings in. The publication module, then, mirrors precisely the configuration state of its corresponding deliverable.
For example, if you're writing engine documents, and the new engine is a Block IV Flexifuel, then the publication module might have attributes that look like :BLK4: and :FLEXFUEL:, along with (if you track such things) a starting serial number as the respective attribute values (document attributes have names and can have values as well, don't forget, and you can use ifeval for serial number range applicability if you have a variant that isn't classified as a block or a mod dot yet . . a subject for later, and it's a wee bit outside of our publications sandbox).
Anyway, those are your applic declarations, they ride in the PM, and all the DMs inherit them. Which is a pretty good segue into our next item.
Applicability
Whew! PMCs took way too long. Let's dive straight into applicability at the DM level.
Applicability aka Conditional Content in Asciidoc is quite a bit lighter than in other CCM/CCS vocabularies, as shown below.
Conditional Content aka Ascii1000D Applicability is done using Asciidoc Conditional Directives in the DMs. Let's use a Data Module (DM) from the PM we're using: \\DMC\DMC-DEMO-000-10-00-01A-280A-A.adoc.
Ignore those other includes for the moment. As you might be able to tell from the Information Codes (INCODEs), those are Asciidoc equivalents to Common Information Repositories. We'll get to CIRs in a second.
Notice the ifdef conditional directive in that procedure. When this data module is run by itself, that step - checking for the yellow warning - does not appear. However, when it is included by the PM declaring CONFIG1 as an applicable attribute, this toggles the content "on", thus giving you some customization for shared document components.
See below for a side-by-side of HTML output, one with no applic declared, and the other as called from the PM with CONFIG1 declared.
Note that unless otherwise stated, from here on out all the sample renders are HTML. It's just a heck of a lot faster.
See there? Since the one of the right is being run from a PM with CONFIG1 declared, that procedural step shows up. When it's not declared, it's suppressed in the output deliverable.
Now you might see what those Illustration Control Files (ICFs) are for. When you have multiple configurations being described by one DM, you need to have a ton of applicability blocks to toggle between all the different graphics, because individual graphics can't really have bits and pieces that can be toggled. Not well, anyway. The ICF gives all those applicability-driven graphics a place to live, so that the writer doesn't really need to worry about juggling those. Whoever's doing the heavy illustration work can do the ICFs, and then the writer just needs to `include` those. When the publication is run from the PM, the PM sets the applicability, and it's persistent all the way down to the ICFs, filtering the applicable graphics.
Now, when it comes to integration with CAD and 3D content, that's a whole other article. Stay tuned!
CIRs
Another usage of the include directive is for bringing in parts of other data modules, an instance of partial transclusion. Let's say we want to use a shared warning. Using the above example, I might have a procedure step that goes like this
This is pretty important! But say I get a call from the safety office. It turns out that the voltage KILLS - and we need to say that everywhere, every single place we have that warning. Over the years, we've worded this warning all sorts of ways, across hundreds of books, so this could get to be a nightmare.
But what if we made a centralized Common Information Repository (CIR) that contains all our warnings, separated into tagged regions? I might have a file ./CIR/DMC-DEMO-000-00-00-01A-0A4A-A.adoc (note the incode, S1000D folks, that IDs it as a CIR), and the warning in that CIR might look like this:
Now, to use that CIR, we use a partial include to that tagged region of the CIR in our procedures, everywhere we want that warning to show up. Note the tag name declaration in the include directive below.
Now, when we fix a warning in that CIR (incode 0A4A), it's fixed everywhere, all at once, wherever it's used. Bam! Here's how it looks when we run it.
We can do this for data restrictions (ITAR and export statements), regulatory statements, acronyms, cautions, wiring data, and a giant bucket of other stuff. It's pretty easy, it's relatively simple, and -- are you getting tired of hearing this yet? -- it works right out of the box, for free.
Sort-Of-Conclusion
I'll be doing some more in this series, but this gives you at least a basic idea of where lightweight markup is these days, in terms of supporting topic or component based authoring. It's a very different place than it was in the mid-2000s!
Notes
[1]No, I don't care about GIANT USER MANUAL metrics at the book level - we need tracing at a component or task level. (Also, a git branch should correspond to an ECR/ECN/Engineering-Ticket-Thing, not, repeat, NOT, a book-level deliverable. The latter is far too big and long-lived for a branch). We need to be able to trace procedures, files, like Superbomb trigger, variable yield - Inspection. "That proc's been hit with part changes sixty times in the last four months!" is a pretty good red flag for problems upstream of the publications department. Or, different situation, same time frame, you can see Jane Doe's done two hundred sixty commits while John Smith had fifty commits. What if John's been working on the Superbomb trigger system? Given what we saw with that system, the manager should maybe check and see if Superbomb trigger is something heinously complicated and/or broken. If it is, John probably needs some help from outside the group to piece together whatever the heck is going on with that system. Or the manager could go back to the programs office and tell 'em, "Your stuff''s broke, we'll document it when you make something that works". Or, maybe, otherwise, John's slacking. Or "all of above".
[2]Anyone who says otherwise is selling you something. The tradeoff for ugly is better consistency, re-use, process efficiency, and more opportunities for integration. To continue the LEGO metaphor, you can theoretically make loads of other toys from the same set of LEGO blocks, which is why they cost so much. Whether that's worth the cost is a decision your business will have to make. And if you're not getting anything out of it, drop topic-based component content like a hot rock. You do not want to be using component content management unless you're re-using at least 75% of the assets. Otherwise, it's like buying and building a brand new LEGO kit for every new toy you wanted to make. This represents a nauseating amount of waste in both human and capital terms. There's a reason LEGO planes can't fly. And if they ever made one that did, it would be hand-crafted from those stupid customized purpose-built pieces that only do the one thing, and now - you poor fool - you've got multiple bespoke configurations to deal with, because the bespoke top level assembly is built from them.
[3] At the end of the day, it's just text, so you could use your favorite text editor with the Asciidoc lexxer enabled. Visual Studio Code gives you more functionality than I could comfortably summarize here: customized autocomplete, reference handling, image tools, etc etc etc. Alternatively, if you're starting out, you could use the standalone AsciidocFX editor which, although under-maintained, integrates several other useful libraries in one piece of "boxed" software. It's great for learning Asciidoc in general, and the DocBook piece works well. FX does use the older DocBook-XSL (FOPUB) for PDF output, however, so you will miss some Asciidoc-only stuff but you gain a larger degree of freedom when it comes to PDF formatting. DocBook-XSL is very old, very kludgy, but very configurable when it comes to print. For some more fun in PDF land, check out my article on the subject.