Structured and unstructured formats

Check my latest blog post!

The information contained in a technical document can be categorized according to its meaning. By default, DITA XML offers three basic types:

Type	Description
concept	Introduction or presentation of a concept.
task	Sequential, numbered, step-by-step procedure for performing a task.
reference	Reference information on a list of items, such as program options.

Structured and unstructured formats

In an unstructured format such as FrameMaker’s traditional format, the technical writer is under no obligation to organize the information according to its meaning. If rigorous editing rules are not followed, the information provided to the user is likely to be unclear and difficult to navigate quickly.

With structured formats such as DITA XML, on the other hand:

the technical writer concentrates on content,
information is presented to the user in a coherent, predictable organization,
access to information is sequential and rapid,
information can be easily reorganized as required,
the usability of the information support provided is optimal.

High-level information types such as task are divided into lower-level types, for example:

Element	Description
prereq	List of mandatory items required to complete a task.
steps	Series of procedural steps.
stepxmp	Example of step completion.

Syntax rules prohibit the technical writer from including a step-by-step procedure in a section of any type other than task. This provides the technical writer with a real writing model to help them present information:

Characteristic	Description
Minimalist	A task section contains only prerequisites, a procedure, and a few other specific elements. All conceptual or reference information is placed in separate sections.
Comprehensive	A task section without a procedure is not a valid DITA XML section and cannot be published. It is possible to implement a mechanism to automatically check for the presence of mandatory information blocks as defined by the DITA XML XSD schema.
Consistent	Information of the same type is presented in the same order and with the same layout; identical blocks of information repeated in different places, such as a remark, come from the same source and are therefore strictly identical.

DocBook or DITA XML?

Some companies have existing content in DocBook format. Often managed by the company’s most technical staff, it coexists with other content in FrameMaker or word-processing format. If the decision is made to consolidate all corporate content in a single format, it seems natural to capitalize on the efforts made in the DocBook creation and publication chain and to select this format. However, this would mean missing out on the spectacular productivity gains offered by DITA XML.

It’s easy to generate DocBook from DITA XML. DITA-OT offers this target format by default, just like PDF or HTML. The reverse operation cannot be fully automated. Why not?

A non-reversible process

It’s not possible to automatically migrate data from information-poor to information-rich formats.

Simply because content in DITA XML format contains more information. Switching from a richer to a poorer format is an entropic operation that can easily be automated. For example, generating a PDF from DITA XML. Performing the reverse operation requires the injection of intelligence, an operation that only humans can perform today.

If your content were a photo, we could make the following analogy:

Content format	Photo format
DITA XML	RAW
DocBook	TIFF
PDF	JPEG

Switching from RAW to TIFF and from TIFF to JPEG is destructive and cannot be reversed.

A non-reversible process

PDF is semantically poorer than DocBook, which is itself poorer than DITA XML.

If your company insists on using DocBook, you can always generate DocBook content from DITA XML source content. As long as the source content remains in DITA XML format (i.e., as long as no changes to the DocBook content are saved), and as long as the DocBook format is only a step in the generation of deliverables, you benefit from the advanced content reuse features offered by DITA XML.

The effort involved in migrating from an unstructured format to DITA XML is a little greater than to DocBook, since you need to inject more semantic information. You also have to migrate the DocBook content to DITA XML, which also represents an effort, albeit a smaller one. But your content is immediately of better quality because it’s more structured. You’ll soon reap the rewards of your hard work, especially if you’re thinking of translating your content into a new language.

Generally speaking, it’s always in a professional’s interest to work on the richest format, if only to be proactive and anticipate new needs.

FrameMaker Migration to DITA XML

Migrating from FrameMaker to DITA XML is not like saving an MS Word document in LibreOffice format. There is no automatic process for migrating an unstructured document to a structured format. In the worst-case scenario, depending on the quality of your original document, this can be like transforming a wasteland into a structured garden. But a well-planned migration allows you to switch to the new format without disrupting the rhythm of your deliveries.

To use a metaphor, if you set yourself the goal of converting a swamp into a French garden, you’d have to go through the English garden stage—a place that may not be strictly architectural, but that’s very pleasant to live in. Good news: if the technical writer has consistently used a limited set of styles and rationally organized their FrameMaker content, they’re certainly already very close to this stage.

Migration from FrameMaker to DITA XML

By the way, if, for any reason, your migration project were to stop there, the technical writers, the company, and the users would already have gained a great deal, respectively, in:

ease of updating,
consistency and speed of publication of new versions,
easier access to information.

FrameMaker Content Restructuring

The automated part of a migration from FrameMaker to DITA XML consists of applying a conversion table between FrameMaker styles and DITA XML structures.

However, a significant amount of restructuring of the FrameMaker document must be carried out beforehand:

restructuring of information according to the three categories: concept, task, and reference,
elimination of overrides (text properties applied manually and overwriting styles; this kind of modification is, if not impossible, at least very limited in a structured format),
harmonization and simplification of FrameMaker styles to limit them and match them to the DITA XML tags that will be used (for example, a note_important style to the <note type="important"> tag). It is necessary to analyze the existing content beforehand and decide which set of tags will be used from among the hundreds proposed by DITA XML: it is strongly inadvisable to use them all.

Restructuring FrameMaker content and setting up the DITA XML chain FrameMaker content restructuring and DITA XML string implementation

This harmonization work can be carried out in parallel with updating and publishing the FrameMaker document. The quality of this document will be all the better for it. Simultaneously with reorganizing your content, you can implement the complete DITA XML creation, management, and publication chain on a sample of your content:

set up the tools,
create style sheets for the various output formats,
provide training for technical writers, graphic designers, and translators,
conduct training and awareness-raising for other company staff.

Only when the chain is reliable and accepted, and even expected by the other players in the company, can the technical writer consider migration.

If your documents are available in several languages, you need to modify the FrameMaker files and perform the migration for each language. If you’re planning to translate your documents into a new language, it’s best to migrate them first!

FrameMaker to DITA XML Conversion Table

Once the FrameMaker files are ready for migration and the DITA XML chain has been fully integrated into the company’s technical and human processes, the Technical Editor can apply the conversion table.

You should now be able to archive the FrameMaker files and then switch over completely to DITA XML.

Apply conversion table from FrameMaker to DITA XML Application of a conversion table from FrameMaker to DITA XML

Of course, you’ll need to apply this process to a small set of documents—one that is not, if possible, of critical importance. After this initial success, you can apply the process to other document sets.

You can now progressively modularize and share your content in the new format to get the most out of DITA XML. During this phase, you can continue to publish new versions of the document; in fact, publishing should be much simpler than with FrameMaker.

Migrating from FrameMaker to DITA XML

The aim of this procedure is to:

migrate FrameMaker content to DITA XML without having to delve into the complexities of FrameMaker EDDs (small projects only!),
manage technical documentation in DITA XML format, without using structured FrameMaker.

Restructure the content and styles of your FrameMaker content files according to DITA XML concepts.
Create an empty FrameMaker document and import all existing styles from the files to be migrated.
Apply all available styles to empty paragraphs in the empty FrameMaker document.
Save the empty FrameMaker document as styles.fm.
Open FrameMaker Structured 11 and create a new DITA XML topic file.
Choose StructureTools ‣ Export element catalog as EDD and save the new EDD as DITA-topic-edd.fm.
Open the styles.fm file, then choose File ‣ Import element definitions and import the element definitions from DITA-topic-edd.fm.
Repeat the above three steps for the other DITA topic types (task, reference, etc.), modifying the file names as appropriate.
Open the styles.fm file, then choose StructureTools ‣ Generate conversion table.
Edit the conversion file and map each FrameMaker style to a DITA XML tag.
Save the conversion table as DITA2FM-conversion-table.fm.
Open a FrameMaker content file under FrameMaker Structured 11 and choose StructureTools ‣ Utilities ‣ Structure current document.
Select DITA2FM-conversion-table.fm and click Add structure.
Save the FrameMaker content file in XML format without selecting an application.
Open the generated XML file in a DITA XML editor and correct the DITA XML syntax. Some aspects of this step can be scripted, but you’ll also need to restructure the content manually. In particular, you’ll need to place cross-references by hand, preferably in a reltable.

To generate the elements needed to build a ditamap file, you can use Perl scripts such as:

#!/usr/bin/perl
open(INPUT,"<$ARGV[0]") or die;
@input_array=<INPUT>;
close(INPUT);
$input_scalar=join(",@input_array);
# substitution
$input_scalar =~ s#\<body‣(.|\n)*?</body‣##ig;
open(OUTPUT,‣$ARGV[0]") or die;
print(OUTPUT $input_scalar);
close(OUTPUT);

You can also easily modularize content using the xml_split XML scissors, or use the Perl XML::Twig module, or this Bash one-liner to rename .dita files after their title:

$ ack "<title‣" *.dita| sed "s# #_#g;" |
tr '[:upper:]' '[:lower:]' |
sed -E "s#(.*.dita)#mv \1#g;" |
sed -E "s#\.dita.*<title‣(.*)</title‣#.dita \1.dita#g;"