AEM Convert Word Documents to DITA

AEM Convert Word Documents to DITA

The following steps are is from page 32 of the PDF at this link

https://helpx.adobe.com/content/dam/help/en/xml-documentation-solution/4-0-1/XML-Documentation-for-Adobe-Experience-Manager_Installation-Configuration-Guide_EN.pdf

AEM Guides (XML Documentation) solution allows you to migrate your existing Word documents (.docx) into DITA topic type documents. You need to specify the input and output folder locations along with other parameters and the document gets converted into DITA document. Depending on the content, you could have a .dita file and a .ditamap file.

To be able to convert a Word document successfully, your document should be well structured. For example, your document should have a Title, followed by Heading 1, Heading 2, and so on. Each of the headings should have some content in it. If your document is not well structured, the process might not work as expected.

By default, XML Documentation solution uses the Word-to-DITA (Word2DITA) transformation framework. This transformation depends on the style-to-tag mapping configuration file. To be able to use the Word2DITA transformation successfully, you must consider the following guidelines for preparing your Word document for conversion:

NOTE: If you make any changes in the default style-to-tag mapping configuration file, then you must update and use the guidelines confirming to your updated style mapping.

  • Ensure that your document starts with a Title; this Title is mapped to the DITA map title. Also, the Title must be followed by some regular content.
  • After the Title, there should be Heading 1, Heading 2, and so on. Each Heading must have some content in it. The Headings are converted into new Concept type topics. The hierarchy of the generated topics is as per the Heading levels in the document, for example, Heading 1 will precede Heading 2, and Heading 2 will precede Heading 3 content.
  • The document must have at least one Heading type content. ? Ensure that you do not have any grouped images. In case you have grouped images in your document, ungroup all such images.
  • Remove all headers and footers.
  • Inline styles such as bold, italics, and underline are converted into <b>, <i>, and <u> elements.
  • All ordered and unordered lists are converted into <o> and <ul> elements. This also applies to nested lists, lists within tables, notes, or footnotes.
  • All hyperlinks are converted into <xref>.
  • The filename of the converted files is based on the heading text followed by a file number. The file number is a sequential number based on the position of the heading text in the document. For example, if a heading text is “Sample Heading” and it is 10th heading in the document, then the resultant filename for this topic will be similar to Sample_Heading_10.dita.

Perform the following steps to convert your existing Word documents into DITA topic type document:

1) Log into AEM and open the CRXDE Lite mode.

2) Navigate to the default configuration file available at the following location: /libs/fmdita/config/w2d_io.xml

3) Create an overlay node of the config folder within the apps node.

4) Navigate to the configuration file available in the apps node: /apps/fmdita/config/w2d_io.xml The w2d_io.xml file contains the following configurable parameters:

  • In the inputDir element, specify the location of the input folder wherein your source Word documents are available. For example, if your Word documents are stored in a folder named wordtodita in projects folder, then specify the location as: /content/dam/projects/wordtodita/
  • In theoutputDir element, specify the location of the output folder or keep the default output location to save the converted DITA document. If the specified output folder does not exist on DAM, then the conversion workflow creates the output folder.
  • For the createRev element, specify whether a new version of the converted DITA topic is to be created (true) or not (false).
  • In the s2tMap element, specify the location of the map file that contains mappings for Word document styles to DITA elements. The default mapping is stored in the file located at: /libs/fmdita/word2dita/word-builtin-styles-style2tagmap.xml
  • NOTE: For more information about the structure of word-builtin-styles-style2tagmap.xml file and how you can customize it, see Style to Tag Mapping in DITA For Publishers User Guide.
  • In the props2Propagate element, specify the properties that should be passed on to the DITA map. This property is required to pass on the default metadata like dc:title,dc:subject,dam:keywords,dam:category from document metadata to converted DITA assets.

5) Save the w2d_io.xml file.

6) After configuring the required parameters in the w2d_io.xml file, log into AEM and open the Assets UI.

7) Navigate to the input folder location (wordtodita).

8) Upload the source Word documents into this folder. For information on uploading content on DAM.

Using the block, you can define one or multiple blocks of configurations for conversion. The conversion workflow gets executed and the final output in the form of a DITA topic is saved in the location specified in the <output> element.


Screen shot of /apps/fmdita/config/w2d_io.xml on my local AEM Author

No alt text provided for this image

要查看或添加评论,请登录

Tyrone Tse的更多文章

社区洞察

其他会员也浏览了