expand.tcl

SYNOPSIS

    tclsh expand.tcl [options] files...

VERSION

This manpage is for Expand 2.2. Changes from earlier versions are described in CHANGES SINCE VERSION 2.0

DESCRIPTION

Expand is a macro processor, based on Tcl 8.0 or later. Expand reads the input files, writing input text to output text. Most text is output unchanged. Any text contained in square brackets, "[" and "]", is expanded as a macro. The result of the expansion is written to the output.

Any valid Tcl command is a valid macro; this document will use the term "macro" for Tcl commands intended to be used within an Expand input file, and "command" for Tcl commands intended to be called from other Tcl code. Before reading any input files, Expand reads any exprules.tcl file in the current directory. This file, called the "rules" file, can define any additional macros needed by the document being processed.

Output can be written to standard output, directed to a file, or suppressed entirely.

Expand can be used to process any kind of input text, but was specifically designed with HTML preprocessing in mind. It is especially useful in maintaining large websites with a significant amount of boilerplate used by many pages. Changing the boilerplate is as easy as redefining an Expand macro. One could define a navigation bar macro, for example, with one argument: the identity of the current page. The macro would format the navigation bar appropriately, in context.

OPTIONS

-help
Displays help text.

-rules file
Specifies the rules file explicitly. If no file is specified, exprules.tcl, if present, will be used.

-out file
Specifies the name of the output file. If the file is nul, no output will be written. If no file is specified, output will be written to stdout. The output file can be changed at any time during processing using the setoutput command.

-errout mode
Specifies what to output when an error occurs during expansion of a macro. Possible values of mode are nothing, macro, error or fail; fail is the default.

If mode is nothing, the erring macro will produce no output. If mode is macro, the macro and its arguments will be output unchanged. If mode is error, the macro and its arguments will be output followed by an error message; this mode is especially useful for locating errors. If mode is fail, the error message will be written to standard error, and Expand will halt.

-web
Enables the optional WEB RULE SET.

DEFINING MACROS

An Expand macro is nothing more nor less than a Tcl command. Describing the Tcl language is beyond the scope of this web page; visit Scriptics, Inc. for more information about Tcl.

Expand macros can do anything you like, but the typical use is to format and return text. The following simple example shows how to highlight text with asterisks:

    proc highlight {text} {
        return "*** $text ***"
    }

If some other kind of highlighting is desired later on, this macro can be changed, and all documents that use it can be updated just by running them through Expand.

It's important that the macro return the formatted text, rather than printing it out. Expand replaces text in the input with the result of this macro, and doesn't print it out until the entire file is complete. If this macro wrote its formatted text to the output itself, it would be written out of order.

Sometimes macros are executed for their side effects, rather than to format output. Such macros should always end with an explicit return, so that they don't add erroneous output. The following macro adds a word to global list; if the return command weren't used, the entire list would be written to the output:

    proc saveword {word} {
        global wordList

        lappend wordList $word
        return
    }

User-defined macros are typically placed in a rules file, such as exprules.tcl.

HOOKS

In addition to macros, the rules file can also redefine one or more hooks. Hooks are Tcl commands defined and called by Expand that have no effect by default, but can be redefined for particular purposes. Expand defines the following hooks:

init_hook
This command is executed just after the rules file is read, and before the output file is opened. If the rules require any initialization, it should be done in the init_hook, rather than inline in the rules files. A common use for the init_hook is parsing rules-specific options from the command line using the getoptions command.

The result of init_hook is never added to the output.

begin_hook
This command is executed after the output file (if any) is opened, but before any files on the command line are processed. The result of the hook is added to the output.

end_hook
This command is executed after all files on the command line are processed. The result of the hook is added to the output.

begin_file_hook fileName
This hook is executed just before an input file is expanded. Its result is added to the output.

end_file_hook fileName
This hook is executed just after an input file is expanded. Its result is added to the output.

raw_text_hook text
Expand operates by breaking its input into a stream of blocks; each block is either a macro to expand or a block of raw text to be output unchanged. If this hook is defined, each block of raw text is passed to it before being output; the hook should transform the raw text in any desired way and output the transformed text.

PARSING THE COMMAND LINE

Expand parses its own options from the command line, loads the rules file (if any), calls the init_hook, verifies that no unknown options remain, and assumes that any remaining command line arguments are the names of input files. Thus, a rules file can define its own command line options or in any other way modify the command line in its init_hook. The easiest way to do this is to use the getoptions command.

Suppose the rules will output summary information to a file if the "-summary" option is specified; the information can be "brief" or "verbose"; furthermore, the rules can be "-strict" in their interpretation of particular macros. The following init_hook will parse and remove the necessary options from the command line.

    proc init_hook {} {
        global argv strictFlag summaryFile summaryMode

        # Parse the options
        getoptions argv {
            {-strict  strictFlag  flag}
            {-summary summaryFile string ""}
            {-mode    summaryMode enum brief verbose}
        }

        # Open the summary file
    }

If $argv was "-summary out.text foo.in bar.in" before the init_hook was called, it will be "foo.in bar.in" afterward.

Presumably the macros in the rules file modify their behavior based on the strictFlag, summaryFile, and summaryMode variables.

CHANGING THE BRACKETS

Expand was initially developed to preprocess HTML documents; for that purpose, the "[" and "]" characters work just fine to bracket commands, as well as being familiar to Tcl programmers. For preprocessing other kinds of files, however, other bracket tokens might work better. The setbrackets command can change the bracket tokens to something more suitable: "{" and "}", or "(*" and "*)", or even "%" and newline. The only requirement is that neither token can be the empty string.

MULTI-PASS PROCESSING

Consider a document with numbered sections: given an appropriate section command, it's easy to generate the numbers automatically. Now consider producing a table of contents at the beginning of the document, with links to the sections. It cannot be done in one pass through the input; at least two passes are needed. The first pass accumulates the table of contents information, and the second actually formats the document, inserting the table of contents in its proper place.

Expand supports this kind of processing via the setpasses and exppass commands. setpasses is called from the init_hook and sets the total number of passes (the minimum, obviously, is 1). During processing, exppass returns the number of the current pass; the first pass is pass 1. When there is more than one pass, Expand's algorithm is more complicated:

Note that by default, no output is produced until the final pass. The macro used in the input files can modify their behavior based on the value of exppass; a section macro, for example, would add information to a global table-of-contents list on pass 1 and return the formatted section header on pass 2.

THE CONTEXT STACK

The context stack is a new feature in Expand 2.0, and solves the following problem. Often it's desirable to define a pair of macros which operate in some way on the raw text between them. Consider a set of macros for adding footnotes to a web page: in Expand 1.x, one might have implemented something like this:

    Dr. Pangloss, however, thinks that this is the best of all
    possible worlds.[footnote "See Candide, by Voltaire"]
The footnote macro would, presumably, assign a number to this footnote and save the text to be formatted later on. However, this solution is ugly if the footnote text is long or should contain additional markup. Consider the following instead:

    Dr. Pangloss, however, thinks that this is the best of all
    possible worlds.[footnote]See [bookTitle "Candide"], by
    [authorsName "Voltaire"], for more information.[/footnote]
Here the footnote text is contained between footnote and /footnote macros, continues onto a second line, and contains several macros of its own. This is both clearer and more flexible; however, in Expand 1.3 there was no easy way to do it. The footnote text would have been expanded into the output in place. In Expand 2.0, however, the footnote macro pushes a new context onto the context stack. Then, all expanded text gets placed in that new context. /footnote retrieves it by popping the context. Here's a skeleton implementation of these two macros:

    proc footnote {} {
        cpush footnote
    }

    proc /footnote {} {
        set footnoteText [cpop footnote]

        # Save the footnote text, and return an appropriate footnote
        # number and link.
    } 
The cpush command pushes a new context onto the stack; the argument is the context's name. It can be any string, but would typically be the name of the macro itself. Then, cpop verifies that the current context has the expected name, pops it off of the stack, and returns the accumulated text.

Expand provides several other tools related to the context stack. Suppose the first macro in a context pair takes arguments or computes values which the second macro in the pair needs. After calling cpush, the first macro can define one or more context variables; the second macro can retrieve their values anytime before calling cpop. For example, suppose the document must specify the footnote number explicitly:

    proc footnote {footnoteNumber} {
        cpush footnote
        csave num $footnoteNumber
        # Return an appropriate link
    }

    proc /footnote {} {
        set footnoteNumber [cget num]
        set footnoteText [cpop footnote]

        # Save the footnote text and its footnoteNumber for future
        # output.
    } 
At times, it might be desirable to define macros that are valid only within a particular context pair; such macros should verify that they are only called within the correct context using either cis or cname.

STANDARD COMMANDS AND MACROS

In addition to the standard Tcl commands, and the user-defined macros, Expand input and rules files can use the following commands and macros.

cget varname
Given the name of a context variable, returns its value. See THE CONTEXT STACK.

cis name
Returns true if the current context has the given name. See THE CONTEXT STACK.

cname
Returns the name of the current context. See THE CONTEXT STACK.

cpop name
Returns all text expanded in the current context, and pops the current context off of the stack. The context name must match the previous cpush. See THE CONTEXT STACK.

cpush name
Pushes a new context onto the context stack. All text will be expanded into this context until the matching cpop. See THE CONTEXT STACK.

csave varname value
Sets the value of a context variable. See THE CONTEXT STACK.

cvar varname
Returns the real name of context variable varname, suitable for using in Tcl append and lappend calls.

expandText text
Expands all macros in the text and returns the expanded text.

expfile
Returns the name of the file currently being processed.

exppass
Returns the current pass number. The value will always be 1 if setpasses has not been called.

exppasses
Returns the total number of passes. The value will always be 1 if setpasses has not been called.

expwrite text
Writes the text to the current output file. This function should not be used by normal macros, which should return the text to add to the output rather than writing it explicitly; it is, however, useful when a function, such as the end_hook, needs to write summary output to several files. The files can be opened using setoutput, and the information written using expwrite.

getoptions arglist ?-strict? deflist
Parses and removes command line options from the arglist.

"arglist" must be the name of a variable in the current scope which contains a list of arguments. It's typically "argv".

If "-strict" is specified, unknown options are flagged as errors.

The "deflist" is a list of option definitions. Each option definition has one of the following forms. In each form, NAME is the option name, which must begin with a "-" character, and VAR is the name of a variable in the caller's scope which will receive the option's value.

{NAME VAR flag}
If option NAME appears on the command line, the variable VAR is set to 1; otherwise it will be set to 0.

{NAME VAR enum VAL1 VAL2...}
If option NAME appears on the command line, the next argument must be one of the enumerated values, VAL1, VAL2, etc. The variable is set to the chosen value, and defaults to VAL1 if NAME does not appear on the command line.

If the next argument begins with a hyphen, "-", getoption assumes that the option's value is missing.

{NAME VAR string DEFVALUE}
If option NAME appears on the command line, the next argument is saved in variable VAR; otherwise, VAR is set to the DEFVALUE.

If the next argument begins with a hyphen, "-", getoption assumes that the option's value is missing.

If any errors are found, an error message is thrown using the Tcl "error" command. Otherwise, the parsed options and their values are removed from the argument list variable.

For an example, see PARSING THE COMMAND LINE, above.

include fileName
Recursively expands the named file using the current rules, as though the file were part of the current file. The begin_file_hook and end_file_hook functions aren't called, and expfile returns the name of the including file.

lb
Returns the current left bracket token ("[" by default).

::expand::listingRuleSet ?options....?
This command loads the macros belonging to an optional rule set for including program listings in HTML documents. See LISTING RULE SET for more information.

popArg listvar
Returns the leftmost element contained in list variable listvar, removing that element from the list. If the list is empty, returns the empty string.

rb
Returns the current right bracket token ("]" by default).

readFile textFile
Reads the entire contents of the file, and returns it. No processing is done on the contents of the file.

setbrackets lb rb
Changes the left and right bracket tokens to "lb" and "rb". Any strings can be used, provided that "lb" and "rb" are neither equal nor empty. New tokens should be carefully chosen to interact well with the input text and not to confuse the Tcl parser.

setErrorOutputMode mode
Sets the error output mode to one of nothing, macro, error or fail; these modes are described under OPTIONS.

setoutput filename
Direct the output to the named file. If the filename is "", the empty string, output will go to stdout; if the filename is "nul", no output will be produced. setoutput is called by Expand just before each call to begin_hook.

setpasses numberOfPasses
Sets the number of passes to some positive value; see MULTI-PASS PROCESSING for details. setpasses should only be called in the init_hook; Otherwise, the results are likely to be unpredictable.

textToID text
Converts a general text string into an ID string: leading and trailing whitespace and non-alphanumeric characters are moved, internal whitespace is converted to the "_" character, and all letters are converted to lower case. The resulting ID is useful as an HTML anchor name, or a Tcl array index. textToID is more usually used in command definitions than directively in an input file.

::expand::webRuleSet
This command loads the macros belonging to an optional rule set for creating web documents in HTML. See WEB RULE SET for more information.

WEB RULE SET

Though usable for other purposes, Expand is frequently used for generating web pages. The "web rule set" is a set of macros I've found useful for this purpose, which may optionally be loaded into any rule set by calling ::expand::webRuleSet. In addition, the "-web" command line option will load the rule set automatically.

dot
Outputs the HTML entity reference for a big black dot character.

link url ?text?
Formats an HTML link to another document or another part of this document. "url" is the URL. If "text" is specified, it is the link text; otherwise the URL is used as the link text. For example,

       [link foobar.html "A Document"]
       [link frobozz.html]
       
expands to

       <a href="foobar.html">A Document</a>
       <a href="frobozz.html">frobozz.html</a>
       
mailto address ?name?
Formats and returns an HTML "mailto" link; in most browsers, these links allow the user to send an e-mail message to the named address. "address" is the address; if "name" is given, it is specified as the link text, otherwise "address" is used. For example,

       [mailto will@wjduquette.com]
       [mailto will@wjduquette.com "Will Duquette"]
       
expands to

       <a href="mailto:will@wjduquette.com">will@wjduquette.com</a>
       <a href="mailto:will@wjduquette.com">Will Duquette</a>
       
tag name args
Outputs an HTML tag. "name" is the name of the tag and "args" is one or more pairs of attribute names and their values. This command is typically used in defining macros to output HTML; it saves backslashing some double quotes. For example,

       return "[tag a href foobar.html]foobar.html</a>"
       
is equivalent to

       return "<a href=\"foobar.html\">foobar.html</a>"
       

today ?format?
Returns today's date. The format is "dd MONTH yyyy" by default; "format" may be any valid format for the Tcl "clock format" command.

LISTING RULE SET

Though usable for other purposes, Expand is frequently used for generating web pages, and many of these web pages include program listings, especially Tcl program listings. The "listing rule set" is a set of macros written to format program listings nicely; in addition, it allows the listings to be saved to the disk so that they can be executed. The rule set is loaded by calling ::expand::listingRuleSet. This command takes the following options, which customize how listings are formatted.

-dir dirname
By default, listing files are written to the current directory. If this option is specified, they will be written to the directory named dirname instead.

-numbers
Listings can be line numbered. If the "-numbers" option is specified, all listings will be line numbered by default. Otherwise, line numbering will be off by default.

-strip
If the "-strip" option is specified, comments will be stripped from all listings by default. Otherwise, comments will be left in. A line is determined to be a comment if the first non-whitespace character on the line is a "#" character. Consequently, stripping won't work for languages which use a different comment character.

-class classname
Listings are formatted using the HTML <pre>...</pre> tag. This tag, like most HTML tags, has a "class" attribute that links it with a cascading style sheet (CSS). If this option is specified, the classname is used as the value of the <pre> tag's "class" attribute.

-style styletext
The <pre>...</pre> tag also has a "style" attribute that allows its display style to be set explicitly. If the -style option is specified, the styletext is used as the value of the <pre> tag's "style" attribute.

-numclass classname
Line numbers, when present, are formatted using the HTML <span>...</span> tag. This tag, like most HTML tags, has a "class" attribute that links it with a cascading style sheet (CSS). If this option is specified, the classname is used as the value of the <span> tag's "class" attribute.

-numstyle styletext
The <span>...</span> tag also has a "style" attribute that allows its display style to be set explicitly. If the -numstyle option is specified, the styletext is used as the value of the <span> tag's "style" attribute for all line numbers.

The listing rule set defines the following additional commands:

listing ?options...? ?filenames...?
Begins a program listing block. By default, the text contained within the block will be formatted and included in the HTML output; it will also be written to each of the specified filenames. The same filename can appear in more than one listing block; the contents of the file will be the concatenation of all of the listing blocks. If no filename is given, the listing is included in the HTML document, but is not written to disk.

The listing command takes the following options:

-silent
The code is written to any specified files, but isn't included in the HTML output.

-strip on|off
Turns comment stripping on or off, for this listing only.

-numbers on|off
Turns line numbering on or off, for this listing only.

For example, the following listing is line numbered, and included in the file "hello.tcl":

       [listing -numbers on hello.tcl]
       # Your first Tcl program
       puts "Hello, world!"
       [/listing]
       
/listing
Ends a program listing block.

tclsh command
Used to create listings of dialogs with the Tcl Interpreter. The specified command is formatted as though it were typed at the Tcl Interpreter and evaluated. For example,
       [listing]
       [tclsh {set a 5}]
       [tclsh {set b 10}]
       [tclsh {expr $a * $b}]
       [/listing]
       
will expand to
       $ set a 5
       5
       $ set b 10
       10
       $ expr $a * $b
       50
       

CHANGES SINCE VERSION 2.0

CHANGES SINCE VERSION 1.X

HISTORY

Expand was written by William H. Duquette; this man page was written using Expand as an HTML preprocessor.

Expand was inspired by a cursory study of the M4 macro processor. M4 is an interesting tool, but its syntax and use can become downright weird. I judged it to be an imperfect tool for preprocessing HTML documents, but it gave me some ideas about how to use Tcl in a similar way. After that, well, it just grew. Any similarity to the philosophy of XML/XSL is purely coincidental, as I didn't read about XML until afterwards.

Should you have any questions, comments, or suggestions about Expand, feel free to contact Will at will@wjduquette.com.

Expand is available from the Expand Home Page.


Copyright © 2000, by William H. Duquette. All rights reserved.