Subst routines

Overview

The Subst_parseRules() function takes a pointer to an array and a pointer to a text buffer. The buffer is parsed for filename-matching rules with associated text-substitution rules, and the parsed rules are added to the indicated array.

The Subst_runRules() function takes a pointer to an array of parsed rules, a name to match against those rules, and a buffer to change. This is used by the RbMake class to support text-rewriting of the raw page data before it is parsed by RbMake into an eBook.

It is also possible to create your own "change-set" of substitutions as long as you create them in order from start to end. To do this, you use the Subst_initChangeset() function to start a new change-set going (of which there can only be one at a time). Then, you add each change in order via the Subst_addChange() call. Finally, you call Subst_applyChangeset() to modify the buffer in one go (which makes the memory moving efficient). Note that the Subst_runRules() routine uses these change-set routines, so you can't intermingle the two (and even if you could, you'd totally mess things up if you changed the buffer before committing the changeset).

For an example of how to use both the rules interface and the change-set interface, see rbburst.c.

Syntax

The syntax for the rule data is as follows:

You specify one or more URL-matching specs by either specifying a wildcard-matching spec (using double-quotes) or a regular-expression-matching spec (using '/' or "mX", where 'X' can be any non-space character). You must terminate the matching spec with a colon, and then follow the last one with a set of substitution rules inside curley braces.

For example:

"*.txt":
/foo\.htm$/:
m%/bar\.txt$%:
{
    SUBSTITUTION-RULES-GO-HERE;
}

Substitution rules look very much like those that you'd find in perl due to the use of the PCRE (Perl-Compatible Regular Expressions) library and some custom parsing code. There is currently no variable expansion in the matching side of the rules and no support (yet) for some of the character rewriting rules (such as \U, \L, \Q, and \E). There is also no support for branching at the moment.

Some examples:

# Put common prefixes (The, A, An) at the end of the title.
s%(<title>)The (.*?)(?=</title>)%$1$2, The%i;
s%(<title>)(An?) (.*?)(?=</title>)%$1$3, $2%i;

# Get rid of all the FONT tags.
s%</?FONT\b[^>]*>%%g;

# Find some map images in the text, tag them with names, and add
# them to the "go to" menu.
s{(<img(?: \S+)* +src="map-(\d+)\.jpg"[^>]*>)}
 {<A NAME="M$2"></A>$1<META NAME="rocket-menu" CONTENT="Map $2=#M$2">\n}sig;