Software Documentation in SGML or XML
Markup Languages and their Advantages
SGML is an old standard and the abbreviation stands for "Standard Generalized Markup Language". It is widely used in professional text processing, but also in areas where the user is not even aware that he uses SGML.
The commonly used HTML format for web-sites is a sub-part of SGML and the abbreviation stands for "Hyper Text Markup Language". XML, the "eXtended Markup Language" is one of the currently used buzz words, but again is only a
derivate of the mother of all markup languages SGML. With XML the attempt was made to use a simplified sub-set of SGML and most of all increase it's popularity compared to the old SGML standard. XML is so similar to SGML that
even most editors can work with both kind of files without problems.
Now what is a markup language? There are a few things which characterize it:
- The files are in ASCII format. I.e. they are simple text files without any binary formats included. Thus they are usable across the complete range of computer platforms.
- There are markups in the file to enclose text portions in order to describe them. This can be a simple textual description like the following: <b>this text shall appear bold</b> . Where there are special constructions in brackets like the <b> in the example, to define e.g. the style of a text portion.
A markup has always a start tag, e.g. the <b> and an end tag which is </b> in our example.
The start tag switches the desitred format ON and the end tag switched it OFF again. Usually formats can be nested. I.e. you can have a text bold and underlined like in this example: <b><u>this text shall appear bold and underlined</u></b>
- There is always some kind of "document type definition" i.e. DTD, which defines the markups and the structure of the document. You can define them freely, and define the possibilities of nesting them. For own definitions you can either have a DTD section at the beginning of a file, or reference to an outside DTD file. For HTML the document type definition is fixed by an international standard. You do not need to define anything, and you are not able to define anything. The DTD is embedded in every internet browser and as you open a web-site it interprets the markups in the file to display the text and images on the screen according to the definitions in this embedded DTD.
- Markups can be text descriptive only, like in HTML, but they also can be defined as structural elements to represent a certain structure of the document. In fact this is the aim of the use of XML in web-sites. The HTML standard only defined textual representation but no structures what so ever. Everything was sequential. With XML now you can also structure you web-sites, e.g. with sections and sub-sections.
- A printable version of a markup text file has to be always interpreted by a style sheet. It basically does the same as an internet browser, i.e. interpreting the markup to control the displaying of the text and images. However, instead of presenting it on the screen, the style sheet controls a print processing software which transform the markup text file into a file in a printable format.
The use of markup languages may be unusual to the average user of Windows based text processing software. Especially since you do not get the "what you see is what you get" feeling. But the advantages of markup languages are at hand:
- Portability across all computer platforms.
- No headache with slowed down tools or even crashes for big files, because the output formatting is nor part of the file.
- Concentration on the contents while you edit the texts, rather than wasting time on formatting and re-formatting the text over and over again.
- Easy re-placement of tags by "search and replace" for different DTDs. Thus texts can be reused in different environments.
- Clear structure in your documents and possibility to restructure them. I.e. templates can be implemented in DTDs very easy. By this everybody knows where to put which information.
- Various control possibilities by the print processor, to e.g. leave structure elements unprinted or print them in various output formats.
The Document Type Definition - DTD
As already mentioned the DTD is a vital part in structured documentation. It is recommended to keep it as a separate file, rather than having a DTD section inside a text file.
The text file then only contains a reference to the DTD. This has to be the very first line in the file, like in this example:
<!DOCTYPE DOCUMENT SYSTEM "e:\tools\sgml\dtd\docu.dtd">
The DTD file itself looks then as in the following example:
The following picture is a graphical representation of this DTD and its structure:
The so called entity "%textopt" is the textual descriptive part of the DTD and constitutes a sub-set of the HTML standard and can be seen in the following picture:
We do not want to explain all the details of a DTD here, since the specifications and sufficient literature and examples are available on the web.
However to point out a few important things we want to mention that it is good style to separate textual and structural elements in a DTD.
The textual section can then be reused for different document structures. In our example the textual elements are compatible to HTML, although not all of the HTML standard is implemented here.
This has the big advantage that the SGML text can be just reused as a web-site without any further editing or adaptation. The current internet browsers ignore the unknown structural tags and just display
the rest which is according to the HTML standard. The following portion is a look inside the SGML text file using a very similar DTD: