DECdocument
DECdocument
Tutorial and
Application Guide


Previous Contents Index


Chapter 12
HTML Publishing with DECdocument

This chapter provides information on HTML files and how to create them.

With V3.2, DECdocument provides vivid-color HTML output. This HTML feature increases reader interest through the use of color. The colored HTML output is automatically generated. Changing colors is simple and easily controlled by users.

12.1 World Wide Web

DECdocument allows you to create files that can be put on-line to the World Wide Web (WWW).

The World Wide Web consists of a network of Web sites located all over the world. Once you gain access to the Web, you can use various Web browsers to locate and display information and documents from other sites.

To access the World Wide Web, a user enters a Web site address or URL (Uniform Resource Locator) such as:


        HTTP://WWW.TTINET.COM/ 
This command would cause the TTINET site page to be displayed. A page consists of one or more screens of site information.

If the entered URL was:


        HTTP://WWW.TTINET.COM/DOC/DOGS.HTML 
The first page of the DOGS.HTML file would be displayed.

The Web browser uses URLs to locate site pages and on-line documents. The URL consists of the service name (HTTP), the domain name (WWW.TTINET.COM), and optionally, a directory and file name (/DOC/DOGS.HTML). The browser controls the actual screen format of a document. The HTML file (i.e. DOGS.HTML) contains text and HTML tags (H1, H2, etc.). The browser reads the HTML file and decides what font(s) to use, font size and how graphics/pictures will be displayed.

12.2 The HTML Files

The on-line document files that the World Wide Web uses are called HTML (HyperText Markup Language) files. DECdocument creates HTML files for use on the Web from SDML files.

DECdocument processes an SDML file and creates one or more HTML files. These files are:


        file_name.HTML                  document body file 
        file_name_CONTENTS.HTML         optional - table of contents file 
        file_name_INDEX.HTML            optional - index file 
Depending on the size of the contents, document body and the index (if there is an index), the contents, body and index might be split into several HTML files. Each HTML file contains 20-30,000 bytes or characters of data. The smaller files allow for faster access and display of data. For example, when HTML files are created for APPLICATION.SDML, the output HTML files are:


        Body files:             APPLICATION.HTML 
                                APPLICATION_001.HTML 
                                APPLICATION_002.HTML 
                                APPLICATION_003.HTML 
                                APPLICATION_004.HTML 
                                APPLICATION_005.HTML 
                                APPLICATION_006.HTML 
 
        Contents files:         APPLICATION_CONTENTS.HTML 
                                APPLICATION_CONTENTS_001.HTML 
 
        Index files:            APPLICATION_INDEX.HTML 
                                APPLICATION_INDEX_001.HTML 
                                APPLICATION_INDEX_002.HTML 
Each HTML file is internally linked to the NEXT and PREVIOUS file. This allows access to any part of the document at all times.

If the document contains a Table of Contents and/or an Index, users can select contents and index items.

The Table of Contents is broken into two sections. The summary section lists just the chapter names which allows for quick movement to a chapter. Following the summary section, is the expanded Table of Contents which lists each chapter and all of the subheadings. This allows users to go to any section within a chapter.

The Index is also broken into two sections. The Master Alphabetic Index lists each letter used in the index. For example:


   | A | B | D | E | G | H | L | M | O | P | R | S | T | U | V | W 

When a letter is selected, the expanded index for that letter is displayed.

12.3 Creating HTML Files

Creating HTML files is as easy as creating PS (PostScript) and TXT output files. The document command is identical except that the destination is HTML. Here are command examples:


        DOCUMENT DOGS.SDML GENERAL HTML/CONTENTS/INDEX 
 
        DOCUMENT DOGS.SDML SOFTWARE.REF HTML/CONTENTS/INDEX/CONDITION=TEXT 
If either of the above commands was given, DECdocument would process the DOGS.SDML file and create:


        DOGS.HTML                  <-- document body 
        DOGS_001.HTML              <--   creation depends on size of body 
        DOGS_CONTENTS.HTML         <-- table of contents file 
        DOGS_INDEX.HTML            <-- index file 
        DOGS_INDEX_001.HTML        <--   creation depends on size of index 
These HTML files could then be displayed on-line with the World Wide Web.

Note

DECdocument will produce HTML output from any SDML file that can be processed by the GENERAL doctype and SOFTWARE.REFERENCE design type. However, in most cases, DECdocument can produce HTML output from other design types as well.

In the event that DECdocument cannot output the HTML file, you will receive appropriate error messages.

This restriction will be lifted in future versions of DECdocument.

12.3.1 Error Log File

If you attempt to build HTML files and the SDML file(s) contain tags that cannot be implemented (i.e. translated into HTML tags), DECdocument creates an error log file. The error log file lists the problem tag(s), file and line location information and the number of errors. For example:


  !--  Error log file for TESTDISK:[TESTER]TEST_PROFILE.HTML 
  Unimplemented Math command: lceil 
  characters \hbox{({\thinspace}\math{\lceil }{\thinspace})}. 
  TEX file :TESTDISK:[TESTER]TEST_FRONT.TEX  Line: 11805 
 
  Unimplemented Math command: lfloor 
  characters \hbox{({\thinspace}\math{\lfloor }{\thinspace})}. 
  TEX file :TESTDISK:[TESTER]TEST_FRONT.TEX  Line: 11811 
 
 
  Errors found:  2 

The name of the error log file is the SDML file name plus "_ERRORS.LOG". For example, if HTML errors were found in DOGS.SDML, the error log file would be DOGS_ERRORS.LOG; for TEST_PROFILE.SDML, the name would be TEST_PROFILE_ERRORS.LOG.

12.4 Converting Graphics Files

If you have graphics files that are PostScript images, you can convert these PS files to HTML readable graphic image files --- .GIF files. The PS files are converted by the DECdocument CONVERT_PS_TO_GIF.COM file.

When an SDML file containing one or more graphics files is processed, DECdocument automatically creates a conversion COM file. The following is an example of a created COM file that would convert APPLES.PS and ORANGES.PS to APPLES.GIF and ORANGES.GIF.


  $! 
  $! TESTDISK:[TESTER]TEST_PROFILE_FIGURES.COM 
  $! Convert Postscript image files to GIF 
  $! Created by DECdocument
  $! HTML file: TESTDISK:[TESTER]TEST_PROFILE.HTML 
  $! 
  $ if  f$getsyi("hw_model") .gt. 1024 
  $ then 
  $   arch = "AXP" 
  $ else 
  $   arch = "VAX" 
  $ endif 
  $ convert_ps_to_gif == "$doc$root:[ghostscript]gs_''arch'.exe 
  $ define gs_device "gif8" 
  $! 
  $ define gs_output_file "apples.gif" 
  $ convert_ps_to_gif testdisk:[tester]apples.ps 
  $     ! GIF image anchor is in: test_profile_002.html at line: 235 
  $ 
  $ define gs_output_file "oranges.gif" 
  $ convert_ps_to_gif testdisk:[tester]oranges.ps 
  $     ! GIF image anchor is in: test_profile_004.html at line: 502 
  $ 

When the PostScript image is converted to a GIF file, it is sized. Sizing trims the image to the minimum size by removing the surrounding white space that is in the PostScript file.

The name of the conversion COM file is the SDML file name plus "_FIGURES.COM". For example, if the SDML file is CATS.SDML, the conversion COM file would be CATS_FIGURES.COM; for TEST_PROFILE.SDML, the name would be TEST_PROFILE_FIGURES.COM.

12.5 SDML Tags

No HTML tags are required for creation of HTML files. However, several SDML tags have been created to allow adding of HTML specific information to the SDML file and for controlling the HTML output file content. The tags are:

The HTML data within the parentheses is only processed when the destination is HTML.

Note

DECdocument ignores the above HTML tags and the data included within the tags unless HTML files are being created (i.e. the destination is HTML).

In V3.2, the following two SDML tags were added to aid in controlling HTML output. The tags are:

12.5.1 Using the <HTML_BACKGROUND> Tag

When creating a HTML file from an SDML file, the background tag can be used to specify a background color or graphic image for all of the HTML file pages. The background tag looks like this:


 
        <HTML_BACKGROUND>("/WHITE.GIF") 
 
When the above tag is used, it will set the background for all parts of the document to "white".

The background can be white (as in the above example) or some other color. The background can also be a graphic image. For example, if you had a graphic that had a picture of a dog or the word "Dogs" lightly showing through a marbled pattern, this could be used as a background. In this case, the graphic file could be called DOGS.GIF and the tag would be:


 
        <HTML_BACKGROUND>("/DOGS.GIF") 
 
HTML graphic image files have the extension of .GIF.

Starting with V3.2, DECdocument also allows an actual color name to be used. For example:


 
        <HTML_BACKGROUND>(TAN) 
 

The above tag causes the DECdocument HTML converter to use the given color as the background color.

If the <HTML_BACKGROUND> tag is used, it MUST be the first tag in the SDML file. If you are using a profile SDML file with multiple SDML files, the tag must be the first tag in the profile file.

12.5.2 Using the <HTML> Tag

The <HTML> tag is used when you want to include HTML specific data in your SDML file. The format of this tag is:


 
        <HTML>(.....) 
 

The data within the parentheses can be any HTML tag/text combinations. For example:


 
        <HTML>(<EM>This is just displayed in my HTML output</EM>) 
 

The <HTML> tag is commonly used to set up a hot-spot where additional information is displayed when the WWW user clicks on high-lighted text. The HTML tags and text within the parentheses identify the text to high-light and the address of the data that is displayed when the user clicks on the high-lighted text.

Here is the hot-spot format:


 
        <HTML>(<A HREF="/directory_location/file_name.HTML">) 
        anchor_text 
        <HTML>(</A>) 
 
Here is an example of a hot-spot in an SDML file:


 
        <p> 
        There are many types of 
        <html>(<a href="/doc/dog_breeds.html">) 
        dog breeds 
        <html>(</a>) 
        in the United States and all over the world. 
 
where DOC is the directory address of the DOG_BREEDS.HTML file and "dog breeds" is the high-lighted anchor text displayed on the WWW user's screen.

The HTML tag can be used to insert any HTML-specific text or tags into an HTML document. For example, you can include images and/or other HTML-specific constructs.

12.5.3 Using the <HTML_OPTIONS> Tag

The <HTML_OPTIONS> tag allows you to override some of the default behavior of the DECdocument HTML converter. There are several options that can be specified. The options are: MANUAL_SEGMENTATION and UNNUMBERED.

MANUAL_SEGMENTATION Option

Normally, DECdocument creates HTML files in sizes which allow for fast access and display. If the MANUAL_SEGMENTATION option is specified, the DECdocument HTML converter will not automatically break apart the body of the document. However, the HTML converter will still automatically break up the index depending on its size.

The format for this option is:

<HTML_OPTIONS>(MANUAL_SEGMENTATION)

UNNUMBERED Option

The UNNUMBERED option causes the DECdocument HTML converter to drop section numbers. If this option is used, only section titles will be displayed, not the section name (i.e. the word Chapter will not be displayed) or its number (i.e. 1.2.1). The effected sections are:

The format for this option is:

<HTML_OPTIONS>(UNNUMBERED)

Both the MANUAL_SEGMENTATION and UNNUMBERED options can be used at the same time. For example:

<HTML_OPTIONS>(MANUAL_SEGMENTATION,UNNUMBERED)

12.5.4 Using the <HTML_SEGMENT> Tag

The <HTML_SEGMENT> tag is used to force the start of a new HTML file. When used in conjunction with the <HTML_OPTIONS>(MANUAL_SEGMENTATION) tag, you have complete control over HTML file segmentation.

The segment name you provide will be appended to the output file name for the DECdocument HTML converter and used as the file name for the new HTML file. Here is an example of how to use the <HTML_SEGMENT> tag.

If the SDML file contains:


 
        .............text......................... 
        .............text......................... 
 
        <html_segment>("apples") 
        .............text......................... 
        .............text......................... 
 
        <html_segment>("oranges") 
        .............text......................... 
        .............text......................... 
 

and the following document command is given:

$ document fruit.sdml general html

these HTML files will be created:

12.5.5 Using the <HTML_HEADER> and <HTML_FOOTER> Tags

These new V3.2 SDML tags are used to control top/header and bottom/footer text in HTML output.

These two tags are placed at the top of the SDML file. If you are using a profile SDML file with multiple SDML files, these tags must be at the top of the profile SDML file.

The <HTML_HEADER> Tag

<HTML_HEADER> causes the given HTML code to be inserted at the top/header of each generated HTML page.

The following example shows how a logo can be placed at the top of each HTML output page:


 
        <html_header>( 
        <literal> 
        <img src="/images/tti_logo.gif"> 
        <br clear=all> 
        <hr> 
 
        <endliteral>) 

Notice that <LITERAL> and <ENDLITERAL> tags are used as part of the HTML code --- that is, they are inside the parenthesis. The "literal" tags allow you to specify any HTML code without having DECdocument attempt to process the tags as DECdocument tags.

The <HTML_FOOTER> Tag

<HTML_FOOTER> causes the given HTML code to be inserted at the bottom/foot of each generated HTML page.

The following example shows how some legal text can be placed at the bottom of each HTML output page:


 
        <html_footer>( 
        <literal> 
        <hr> 
        <a href="/legal/legal_notice.html">Legal Notice</a> 
 
        <endliteral>) 

Notice that <LITERAL> and <ENDLITERAL> tags are used as part of the HTML code --- that is, they are inside the parenthesis. The "literal" tags allow you to specify any HTML code without having DECdocument attempt to process the tags as DECdocument tags.

12.6 HTML Options Logical

In V3.2, a new logical was added to DECdocument which is used to set up HTML options. The new logical DOC$HTML_OPTIONS can be used to set up any of the options that are currently set up using the SDML <HTML_OPTIONS> tag. The DOC$HTML_OPTIONS logical is defined as follows:


        $ DEFINE DOC$HTML_OPTIONS "HTML_option, HTML_option, etc...." 

For example:


        $ DEFINE DOC$HTML_OPTIONS "UNNUMBERED, REVISION COLOR BLUE" 

DECdocument processes the HTML options logicals and tags in this order:

  1. DOC$HTML_OPTIONS logical
  2. individual HTML logicals --- i.e. DEFINE DOC$HTML_FILESIZE 15000
  3. <HTML_OPTIONS> tag --- i.e. <HTML_OPTIONS>(UNNUMBERED)

12.7 HTML File Sizes and Names

DECdocument allows you to control the size of HTML files and the length of HTML file names.

12.7.1 Changing HTML File Sizes

By default, DECdocument tries to break documents into files of 20k-30k bytes. If you wish to override this behavior, you can do so by defining the logical DOC$HTML_FILESIZE. For example:


        $ DEFINE DOC$HTML_FILESIZE 15000 

The above command would cause DECdocument to try to break documents into files of 15K-20K bytes.

Before the DECdocument HTML converter starts processing data, it checks for the value of the logical DOC$HTML_FILESIZE. If the logical is not defined, a value of "20000" is used. If the body of the document is longer than the FILESIZE, DECdocument searches for a logical break point - such as the start of a chapter or section. After the break point is located, the converter closes the current HTML file (i.e. APPLICATION.HTML) and opens a new HTML file (i.e. APPLICATION_001.HTML). This procedure is repeated until all of the document data is processed.

DECdocument attempts to use chapters as logical break points. If no chapter break is found and the HTML file size is 125% of the FILESIZE value, the converter searches for the next heading. If no heading is found and the file size is 150% of the FILESIZE value, the converter searches for the next paragraph.

12.7.2 Shortening HTML File Names

By default, the name of the HTML file is the same name as the SDML file or a specified output file name (i.e. /output=my_file) plus the extension ".HTML". You can override this behavior and shorten the output HTML file names to five (5) characters by defining the logical DOC$HTML_SHORT_FILENAMES. For example:


        $ DEFINE DOC$HTML_SHORT_FILENAMES TRUE 
 
        $ DEFINE DOC$HTML_SHORT_FILENAMES YES 

If the DOC$HTML_SHORT_FILENAMES logical is defined, the DECdocument HTML converter will truncate the output HTML file names to the first five characters of the supplied name (i.e. the name of the SDML file or the argument to the /OUTPUT qualifier). Also, the ".HTML" extension will be shortened to ".HTM".

Here are two examples:


     SDML file:           APPLICATION.SDML         APP_PROFILE.SDML 
 
 
     Body files:          APPLI.HTM                APP_P.HTM 
                          APPLI001.HTM             APP_P001.HTM 
                          APPLI002.HTM             APP_P002.HTM 
 
     Contents files:      APPLITOC.HTM             APP_PTOC.HTM 
                          APPLIT01.HTM             APP_PT01.HTM 
 
     Index files:         APPLIIDX.HTM             APP_PIDX.HTM 
                          APPLII01.HTM             APP_PI01.HTM 

The shortened HTML file names conform to PC naming standards.


Previous Next Contents Index