Source format
Check my latest blog post! lang: en hideFromSidebar: true proofreading: IA
Section titled “Check my latest blog post! lang: en hideFromSidebar: true proofreading: IA”The content of a technical writing project is created in a source format, distinct from the format of the deliverables, or the target format. To use an analogy often found in software development, the source format is the recipe, while the target format is the dish. In photography, the source format is the RAW format produced by the camera, preferred by professional photographers for retouching, and the target format is the JPEG format.
Word processors have moved us away from distinguishing between form and content. However, confusing the two leads to numerous errors and wasted time.
The document presented to the user comprises two fundamental aspects:
- Content
- Layout
During the development of technical documentation, these two aspects must be clearly distinguished and can be managed by different individuals:
- The technical writer
- The graphic designer
When layout is as essential as content, or when it needs to be varied, as in a sales brochure, writing and layout are performed using separate tools:
- Text editor
- Desktop publishing (DTP) software like InDesign or Scribus
When layout is less critical than content, or when it needs to be consistent, as in technical documentation, editing and layout are performed in:
| File type | Example |
|---|---|
| The same files | For example, FrameMaker files. |
| Different files | For example, XML content files and an XSLT stylesheet. |
In a FrameMaker file, the separation between content and form is significant, though not complete: content and layout reside in the same file. FrameMaker applies a consistent page layout to an entire file but allows manual addition of layout elements. The same layout can be duplicated for the entire document, or different layouts can be used for each file comprising the document.
Source formats: degree of modularity and format
Source formats can be classified by their degree of modularity and file format.
Structured XML formats such as DocBook and DITA XML apply a uniform page layout to an entire document and do not permit the manual addition of layout elements or the application of different layouts to the various files that comprise the document.
| Format | Manual layout options |
|---|---|
| MS Word | Yes |
| FrameMaker | Yes |
| DITA XML | No |
When content and layout are closely intertwined, as in a word processor, modifying the content without disrupting the layout is challenging. Consequently, each time a new version of technical documentation is published, the technical writing team spends long hours correcting layout errors generated by the software. This issue is less prevalent with FrameMaker but remains significant. It is nonexistent with DITA XML and DocBook (the only errors that can occur are compilation errors due to incorrect XML syntax; these are easily corrected).
Technical documentation source files are in:
- Binary or
- Text format
This format is also:
- WYSIWYG or
- Structured
Finally, this format is:
- Modular or
- Monolithic
This last aspect determines how the format handles single-sourcing:
- According to a book-to-online-help logic or
- Online-help-to-book
The available formats can be classified according to the following table:
| Structured format | Manual layout option |
|---|---|
| FrameMaker | No |
| DocBook | Yes |
| DITA XML | Yes |
FrameMaker and DocBook are not entirely modular, as the smallest information units that can be manipulated are not generic: they contain elements such as table of contents structure or cross-references that are only valid in limited contexts.
Monolithic or modular documents
Section titled “Monolithic or modular documents”The source format can be based on either monolithic files or clusters of modular files.
Monolithic files (e.g., MS Word, LibreOffice, or FrameMaker) centralize all content in a single, easily manageable file, but limit content sharing; this increases the risk of inconsistent or duplicate information.
Monolithic technical writing source format
Clusters of modular files (e.g., DITA XML) aggregate the content of multiple files, facilitating content sharing and reuse. This approach is challenging to implement company-wide but should be standard for a technical writing team.
Modular technical writing source format
Some word processors can handle modular documents, though not effectively. Conversely, a DocBook or DITA XML document can be monolithic, but then loses its flexibility.
Markdown as a source format
Section titled “Markdown as a source format”A modern source format widely used in technical writing is Markdown. It is a lightweight, human-readable text format that clearly separates content from formatting.
For instance:
- Headings are indicated by
#, not by visual styles. - Lists are marked with
-or*, not graphic bullets. - Bold or italic text is framed by
**or_.
Markdown offers several advantages for technical writing:
- Simplicity: Easy to learn and use.
- Modularity: Each Markdown file can stand alone as an information module.
- Interoperability: Text files can be converted to HTML, PDF, DocBook, DITA, or even Word, thanks to generators like Pandoc.
- Traceability: As the files are plain text, they are ideally suited for versioning with Git or other source control systems.
Markdown enables the creation of structured, reusable content while remaining simple and accessible to technical writers. It bridges conventional word processing (WYSIWYG, but not very modular or structured) and XML (modular and structured, but complex to manipulate).
What is an information module?
Section titled “What is an information module?”The world’s most recognized modular system is undoubtedly Lego bricks. Adapted to technical documentation, the module principle improves the quality of technical manuals and the productivity of technical writers.
However, is it sufficient to convert FrameMaker documentation to a structured format like DITA XML or Markdown to achieve modular documentation? Unfortunately, no. If the original content mixes various types of information (concepts, step-by-step procedures, references), you can convert it to Markdown or DITA XML without strictly adhering to the format’s semantics.
Nonetheless, if your document ends up being based on files corresponding to different schemas (concept, task, or reference), it doesn’t necessarily result in truly modular documentation. Indeed, if a document contains only files of a single type, it risks being incomplete and incoherent.
This documentation is not modular, as it lacks genuine information modules. A module is a complete and coherent atomic element reusable in different contexts. If the original monolithic document is merely split into numerous files, information modules haven’t been created yet. The next step is to rewrite each file (using the minimalist approach, for example) to make it more generic and a genuine module.
It’s crucial to adopt a structural approach and decide on each module’s content from the perspective of the overall document architecture. Similarly, references like “See next section” should be replaced by cross-references. Ideally, these cross-references should not be located in the content files themselves but in a relative section specific to each ditamap file.
In this way, modules are perfectly decontextualized, and structural information such as cross-references is placed in files without textual content.
Binary or text files
Section titled “Binary or text files”Source formats can be binary or text.
- Binary formats are opaque: if opened with a text editor like Notepad, they display a series of indistinct characters and can typically only be modified with specific software.
- Text formats are transparent: if opened with a text editor, they show text and tags, allowing them to be modified with various software, easily manipulated via batch processes on the command line, and facilitate the use of powerful regular expressions.
Markdown exemplifies a transparent, simple, modular, and easily traceable text format.