From document to modular document base

Check my latest blog post!

The book model remains predominant for creating and managing information. However, enterprise content is often scattered across numerous documents in various formats, resulting in duplication, inconsistencies, high update and translation costs, and delivery delays. More efficient models are available to technical writers.

The DITA XML structured authoring format offers a transition from the book model to a modular document base. Enterprise content is composed of single building blocks that can be dynamically assembled on demand to create documents in different target formats.

Modular documentation offers unparalleled flexibility

The volume of source content is minimized, reducing the costs of creating, updating, and translating corporate content. Additionally, the technical writer can manage the writing, validation, and translation processes module by module. Workflows can thus be parallelized, reducing time-to-market.

DITA XML files can also be easily centralized in a single repository, such as a ECM system or a VCS. This approach preserves the company’s intangible capital.

A language with tags

DITA XML is a tag-based language: the technical writer structures the information in source files without layout, similar to computer code source files. The user receives a target document, such as a PDF file, where tags are replaced by typographical formatting.

If your company provides customers with technical documentation in MS Word format, the technical writer and the user share the same information media, with no distinction between source and target files. However, while this might seem like the simplest solution, it is not very effective in terms of productivity for the technical writing team and information structuring.

With a text format like DITA XML, the technical writer and reader have very different media at their disposal:

Role	Description
Technical writer	The technical writer manipulates source files using tags to construct the document, marking the information elements they create or reuse. Tags are nested according to a rigorous syntax. The source file is not in WYSIWYG format; the layout is applied when the source files are transformed into target files, i.e., when the deliverables are generated. Some graphics software, such as XMetal, Oxygen, or structured FrameMaker, offer the WYSIWYM format, where tags are replaced on screen by a generic layout, different from the document’s final appearance. The advantage of a markup language is that you can see exactly what you’re doing by manipulating the markup yourself, without delegating interpretation to graphics software.
User	Only content is presented to the reader in the target file; text marked with tags in source files has a typographical emphasis, the meaning of which is explained in the Typographical Conventions section of the final document.

Role

Description

Technical writer

The technical writer manipulates source files using tags to construct the document, marking the information elements they create or reuse. Tags are nested according to a rigorous syntax. The source file is not in WYSIWYG format; the layout is applied when the source files are transformed into target files, i.e., when the deliverables are generated. Some graphics software, such as XMetal, Oxygen, or structured FrameMaker, offer the WYSIWYM format, where tags are replaced on screen by a generic layout, different from the document’s final appearance. The advantage of a markup language is that you can see exactly what you’re doing by manipulating the markup yourself, without delegating interpretation to graphics software.

User

Only content is presented to the reader in the target file; text marked with tags in source files has a typographical emphasis, the meaning of which is explained in the Typographical Conventions section of the final document.

A DITA XML source file is a mixture of text and tags, delimited by the < and > signs. The text itself is encapsulated in a set of opening tags of type <tag> and closing tags of type </tag> according to the <tag>text</tag> scheme. Any text entered outside an opening and closing tag is incorrect and produces an invalid file.

High-level information typology

DITA XML provides the technical writer with a high-level typology to help structure content.

When creating a new document in FrameMaker, DocBook, or a word processing format, the technical writer begins with a blank page. Depending on their professional rigor, the information transmitted to the user will range between two poles:

Element/Concept	Description
Rational organization	The user has quick and easy sequential access to the information they need.
Informative magma	The user has to read an entire section, or even the entire document, to find useful information.
concept (DITA XML)	General text, such as an introduction or presentation.
task (DITA XML)	Step-by-step procedure for performing a task.
reference (DITA XML)	Reference information such as command parameter explanations.

Each of these high-level categories has its own set of lower-level tags. If the technical writer is writing a technical document, the information they collect likely falls into one of these three categories. This division into types of information mandates that the technical writer structure the information effectively. Consequently, users can access information more easily and quickly, enhancing the overall usability of technical documentation.

On-demand content organization

Information bricks can be assembled on demand in external table of contents structures, called ditamaps.

The organization of information in DITA XML is flexible. Bricks can be organized in different hierarchical structures according to changing needs. If the technical writer carefully builds atomic and generic information bricks, they can, similar to a car manufacturer constantly proposing new models by assembling standardized elements, create the following documents:

Document	Content
Themes systematically organized into concepts and step-by-step procedures	Presentation document
Presentation document	Concepts
Quickstart	Step-by-step procedures
Reference manual	Reference information

To achieve this, the technical writer must place context-specific elements in ditamap structures rather than in DITA XML content files. Specifically, cross-references should be indicated in a reltable within the ditamap: if document A refers to document B in ditamap 1, it should also be usable without modification in ditamap 2, where document B is not included.

The organization of working directories must also facilitate the use of relative links, particularly to images, to prevent broken links.

Single-sourcing: One source, multiple targets

Single-sourcing has long been a contentious issue among technical writers: should different technical writing media, such as online help and a printed manual, offer radically different content, or can they be generated from the same source content?

Productivity constraints and cost-cutting have driven the debate in favor of single-sourcing. While the qualitative gain is debatable, it does not outweigh the cost of creating, maintaining, and translating a distinct source version for each target version.

One set of information, multiple output formats

If the technical writer employs single-sourcing, they must select the paradigm—book or online help—at the project’s outset. Traditionally, tools were based on either a book-like document (MS Word or FrameMaker) exportable to online help format or a Windows help source file (RTF), exportable to PDF. A significant loss of navigation information (indexes, cross-references, links, etc.) often occurred during this process.

DITA XML provides a target-format agnostic model. Source files, although based on a modular model akin to online help, can be easily exported as PDF files, online help, linked HTML pages, or other formats without any loss of information.

Topics: Basic DITA XML information modules

Topics are the smallest autonomous information units managed by DITA XML. Each topic comprises a title and body text, focusing on a single subject. It is the technical writer’s responsibility to leverage the modularity offered by DITA XML to structure the information effectively.

Topics are semantically typed. Ideally, a specific type of topic is aligned with each type of information. By default, DITA XML provides topics suitable for software documentation (concept and task descriptions, command lists, etc.), though new topic types can be developed to cater to other requirements.

Topics constitute one of the main differences between DITA XML and DocBook, which does not offer a typology of information bricks.

Topics are typically stored flat in directories organized by topic type. They are hierarchically organized in ditamap files and can be shared across different documents. Module titles are not assigned a title level. As the module structure is perfectly homogeneous, a module can hold a level 3 in one document and a level 1 in another without any need to modify the topics.

Atomic units of information, such as remarks, paragraphs, even sentences or sentence segments, which cannot be given a title, do not form topics. However, they can be shared using the conref mechanism, similar to the Xinclude mechanism offered by DocBook.

Managing DITA XML content with or without a CMS?

The DITA XML architecture does not offer a native document workflow mechanism. Yet, workflows are integral to an efficient content lifecycle management process.

The CMS also manages metadata, enabling more efficient searches of existing information and managing backlinks.

Most companies hesitate to implement CMS, tools dedicated to workflows, often due to previous failures with such solutions.

Moreover, one of the great advantages of DITA XML is its direct integration into existing information systems. For software companies, it is straightforward to integrate onto existing source management systems, whether it’s Git, Subversion, or SourceSafe, all on a tight budget. This frequently negates the need to invest in a CMS. However, some companies report spectacular productivity gains, such as Epson America, which was able to reuse up to 90% of existing content on new projects after implementing a DITA XML-compatible CMS.

If opting for a CMS, it must clearly support DITA XML: managing information bricks like a monolithic document is impossible. Hence, solutions like SharePoint or Alfresco should be replaced with dedicated alternatives such as Componize or DocZone.

Regardless of the initial choice, a strategy change is feasible at any time without affecting the existing system. The DITA XML architecture is not tied to any particular repository, allowing projects to start without a CMS and transition to a DITA XML solution if beneficial.

Additional resources

Git: from file to content