NAME

rbmake - a book-maker for .rb files


SYNOPSIS

rbmake [OPTIONS] [FILE/URL] [...]


DESCRIPTION

The rbmake utility can be used to make your own Rocket-eBook-compatible ebooks (.rb files) from HTML/text files found locally or via the web. Rbmake produces better results than the RocketWriter software, and provides you with more (and more powerful) options.

Here are some feature highlights:

See EXAMPLES for some simple examples of how to use rbmake.


OPTIONS

The following options may be specified on the commandline. See the option-file section for a way to specify a slightly more powerful set of options.

-a AUTHOR
Specify the book's author. For example: rbmake -a 'Tolstoy, Leo' wp.html

-b
Translate paragraph tags into book-style paragraphs (with an indented first line and no preceding empty line). Repeating this option increases the ``depth'' (in the HTML) at which the book paragraphs are expected to occur. For instance, if the P tags are enclosed in a DIV (rather than directly inside the BODY), you'd need to specify an extra -b to indicate that the P tags are an extra level deep.

-c FILE/URL
Set the cover image. The specified FILE/URL will be the first thing placed in the first page of HTML. This can be used even without specifying -i.

-D
Dump an option file to stdout and quit without doing any other work. You'll get the default options if you specify nothing other than -D.

-e
Enhance the punctuation using improved quotes, dashes, and ellipses. Double-quotes are translated into opening and closing double-quotes by alternating between them within a single paragraph (allowing for unclosed paragraph quotes). Backticks are translated into opening single-quotes. Apostrophes are translated into closing single-quotes. Two dashes in a row are translated into an em-dash. Three periods in a row (possibly separated by normal whitespace, but not by non-breaking spaces) are translated into an ellipse character.

-E #
Edge-enhance images: none or 1 - 9 (9 = lots, default is 7).

-f no|yes|any|DEPTH
Follow links in the HTML documents to find new ones to include. If the value is a number, we follow links up to DEPTH links away from a user-specified page. Otherwise specify 'y' (or a) for unlimited depth, or 'n' for no following.

By default, only links that are share a common path with the starting pages are followed. Use the -m/-M options to change this.

-g DESCRIPTION=URL
Add a menu-item to the ReB's ``Go To'' menu.

-h
Print this help message.

-i
Include images and audio (.wav) files.

-I FILE
Specify a .info file to use as the basis for creating the new info page. If you specify '-', rbmake will read from stdin. If you specify '.', the first ARG will be used to find the info information. You can specify the name of a .rb file here if you want the info page read from there.

-j
Join all the pages into a single, unified HTML page.

-J #
Join every # pages together into a unified HTML page.

-k
Generate a .hkey (dictionary index) for the root .html page.

-l FILE/URL
Load the option file from the indicated FILE/URL. See -D for how to easily create an option file.

-L ARG
Specify an ARG for an option file. E.g.: rbmake -L arg1 -L arg2 -l foo.opt

-m URL-PREFIX
Add the URL-PREFIX to the match-list that determines which links we follow. Maybe be repeated as many times as needed. Implies -f.

-M URL-PREFIX
This works just like -m, plus it indicates that the -m/-M options are the only match-items that should be used (normally each ARG item adds its path to the match-list).

-n
Non-interactive mode (e.g. avoid username/password prompting).

-N #
Specify the maximum # of pages that we should fetch.

-o NAME
Specify the output NAME of the .rb file (default: the URL-name). If you specify '.', the name portion of the first ARG will be used.

-O
The output name for the .rb file will be the name from the -I option.

-p
Prompt for the title and author information (not affected by -n).

-P
Prompts for a username & password and then outputs the result base-64 encoded. This is useful if you want to slightly obscure some information for the Auth-Info option in an option file.

-R
Rewrite a .rb file (which must be the first argument). Using -R foo.rb is short for -OI foo.rb foo.rb. Note: images are stripped without -i.

-s FILE/URL
Read text-substitution rules from the indicated FILE/URL. Use '-' for stdin.

-t TITLE
Specify the book's TITLE. You'll probably need to quote the string. For example: rbmake -t 'War and Peace' wp.html

-T n|p|s
Specify the .txt translation mode. 'p' is preformatted (the default), 's' is simple paragraph translation (i.e. 2 newlines indicate a new paragraph), 'n' is no translation (only used when your rewriting rules are handling the translation).

-u
Unjoin the pages from inside an rbmake-joined .rb file.

-U URL-NAME
Specify the value of the (.info) URL-NAME. The default is a unique value.

-v
Output verbosely about problems found in the HTML.

-V
Output the version of rbmake.

-w
ARG names default to web pages (http://) rather than files (file:$CWD).

-W
Write the created .info page using the .rb file's name & a .info suffix.

-x MATCH
Exclude the matching URLs from being included (both links and images). May be repeated as many times as needed. The match strings are wildcard rules (not regular expressions).

-z
Allow <HR SIZE=0> to specify a page break (in addition to <HR NEW-PAGE>).

These options may be specified in an option file (see the -l commandline option for how to specify what option file to read) and the -L option for how to specify optional parameters for your option file.

Accept-URLs-Matching: URL-PREFIX
Add the URL-PREFIX to the match-list that determines which links we follow. Maybe be repeated as many times as needed. Implies Follow-Links.

Allow-Old-Style-Page-Breaks: yes|no
Allow <HR SIZE=0> to specify a page break (in addition to <HR NEW-PAGE>).

Auth-Info: AUTHORIZATION-INFORMATION
If you want to put the username/password information for accessing your web pages into the option file, you can do so with the Auth-Info option. The format of the data is:

    http://www.site.com/path/|Realm Info|username:password
You can leave the ``Realm Info'' empty if you supply the /path/ info in the URL and conversely, you can leave off the /path/ if you supply the ``Realm Info''. The ``username:password'' data my be supplied base-64 encoded, which gives you some minimal obscurity if someone looks over your shoulder when you're editing the file. See the -P option for a simple way to do this.

If you need to specify the username/password for your proxy, you'd do so like this:

    Auth-Info: proxy||username:password
or

    Auth-Info: proxy||dXNlcm5hbWU6cGFzc3dvcmQ=
Auto-Accept-Input-File-Dirs: yes|no
The ``yes'' default means that every URL specified via Input-File automatically adds its directory (or URL path) to the accepted list of allowable files/URLs. Setting this to ``no'' means that you have to specify them all manually via Accept-URLs-Matching.

Book-Filename: NAME
Specify the output NAME of the .rb file (default: the URL-name). If you specify '.', the name portion of the first ARG will be used. If you specify the string _Put_Title_Here_ , rbmake will use the book's title to name the book. This naming occurs after all the normal title-determining steps have occurred, including reading the first HTML file and possibly prompting you to verify the title (and author).

Cover-Image: FILE/URL
Set the cover image. The specified FILE/URL will be the first thing placed in the first page of HTML. The image will also be resized to be as large as possible in the ReB's portrait orientation. This option can be used even without specifying the Include-Images option.

Enhance-Punctuation: yes|no|[``'-.]
Enhance the punctuation using improved quotes, dashes, and ellipses. Double-quotes are translated into opening and closing double-quotes by alternating between them within a single paragraph (allowing for unclosed paragraph quotes). Backticks are translated into opening single-quotes. Apostrophes are translated into closing single-quotes. Two dashes in a row are translated into an em-dash. Three periods in a row (possibly separated by normal whitespace, but not by non-breaking spaces) are translated into an ellipse character.

If you only want some enhancement performed, list the items you want enhanced instead of saying ``yes'' (i.e. specify " for double-quotes, ' for single-quotes, - for em-dashes, and . for ellipses).

Exclude-URLs-Matching: MATCH
Exclude the matching URLs from being included (both links and images). May be repeated as many times as needed. The match strings are wildcard rules (not regular expressions).

Follow-Links: no|yes|any|DEPTH
Follow links in the HTML documents to find new ones to include. If the value is a number, we follow links up to DEPTH links away from a user-specified page. Otherwise specify yes (or any) for unlimited depth, or no for no following.

HTTP-Header: HEADER: VALUE
Specify an HTTP header that should be included when sending HTTP requests to web servers. For example:

    HTTP-Header: User-Agent: Mozilla/4.0 (compatible ...)
Image-Edge-Enhancement: none|1-9 =item Image-Edge-Enhancement: none|1-9=MATCH
Edge-enhance images: none or 1 - 9 (9 is the most; default is 7). Specifying this without a =MATCH string will set the default value. Repeat this option (with various MATCH strings) as many times as needed. The match strings are wildcard rules (not regular expressions).

Import-Info-From: FILE/URL
Specify a .info file to use as the basis for creating the new info page. If you specify '-', rbmake will read from stdin. If you specify '.', the name of the first Input-File will be used to find the info information (if it is a .rb file, the info will be read from there, otherwise the file's suffix will be replaced with .info and the info will be read from that file). You can specify the name of a .rb file here if you want the info page read from there.

Include-Audio: no|yes|match
Include audio (.wav) files. Specifying ``yes'' means that the audio's path must only avoid matching one of the Exclude-URLs-Matching wildcards. Specifying ``match'' means that the path must also match one of the Accept-URLs-Matching wildcards.

Include-Images: no|yes|match
Include image (GIF/JPEG/PNG) files. Specifying ``yes'' means that the image's path must only avoid matching one of the Exclude-URLs-Matching wildcards. Specifying ``match'' means that the path must also match one of the Accept-URLs-Matching wildcards.

Input-File: FILE/URL
The indicated FILE/URL will be included in the book. If Follow-Links and Auto-Accept-Input-File-Dirs is also on, then the page's path is also added to the list of allowed links.

Input-Files-Default-To-Web-Pages: yes|no
An unqualified Input-File will be interpreted as an URL rather than being treated as a path name. In other words, by default ambiguous.net would be treated as a filename, but with this option turned on, it is treated like a web page. No matter what you can always force the right meaning by prefixing a web page with http: and prefixing a file with file:.

Make-Dictionary-Index: yes|no
Generate a .hkey (dictionary index) for the root .html page.

Menu-Item: DESCRIPTION=URL
Add a menu-item to the ReB's ``Go To'' menu.

Non-Interactive: yes|no
Avoid any unrequested prompts (i.e. username/password prompting).

Page-Joining: none|all|COUNT
If ``all'' is specified, every HTML/text page is joined into a single page. If the value is a number, a maximum of COUNT pages in a row will be joined together (which is sometimes useful to avoid creating a single page that is too large for the ReB to handle).

Prompt-For-Book-Info: yes|no
Ask the user what the Title and Author should be. The prompt occurs after we read the first page, so the default answer is the title/author info we were able to figure out from there (unless specified via option).

Set-Info: NAME=VALUE
Set one of the NAME=VALUE pairs that get put into the new info page. Valid NAMEs include TITLE, AUTHOR, GENRE, ISBN, URL, etc.

Substitution-Rule-File: FILE/URL
Read text-substitution rules from the indicated FILE/URL. Use '-' for stdin OR in combination with a dashed divider line to indicate that the rest of the option file should be read as the substitution rules.

Text-Conversion: none|preformatted|simple-para
Specify the .txt translation mode. The default is preformatted (which works like putting the text inside an HTML <PRE> section), simple-para turns an empty line into a paragraph break), and ``none'' specifies that we should do no translation (and should only be used when you have supplied your own rewriting rules via the Substitution-Rule-File).

Use-Book-Paragraphs: yes|no
Translate paragraph tags into book-style paragraphs (with an indented first line and no preceding empty line).

Unjoin-Rb-Files: yes|no
Unjoin the pages from inside an rbmake-joined .rb file.

Verbose-Output: yes|no
Output verbosely about problems found in the HTML.


SUBSTITUTION SYNTAX

The Substitution-Rule-File has the following syntax:

You specify one or more URL-matching specs by either specifying a wildcard-matching spec (using double-quotes) or a regular-expression-matching spec (using '/' or ``mX'', where 'X' can be any non-space character). You must terminate the matching spec with a colon, and then follow the last one with a set of substitution rules inside curley braces.

One (unlikely) example matching spec:

    "*.txt":
    /foo\.htm$/:
    m%/bar\.txt$%:
    {
        SUBSTITUTION-RULES-GO-HERE;
    }

Note that wildcard matches are anchored (i.e. you'd need to use ``*foo*'' to match ``foo'' anywhere), and that regular expressions have to be manually anchored (using '^' and/or '$', as needed).

Substitution rules look very much like those that you'd find in perl due to the use of the PCRE (Perl-Compatible Regular Expressions) library and some custom parsing code. There is currently no variable expansion in the matching side of the rules and no support (yet) for some of the character rewriting rules (such as \U, \L, \Q, and \E). There is also no support for any programmatical branching at the moment.

Some substitution rules:

    # Put common prefixes (The, A, An) at the end of the title.
    s%(<title>)The (.*?)(?=</title>)%$1$2, The%i;
    s%(<title>)(An?) (.*?)(?=</title>)%$1$3, $2%i;
    #
    # Get rid of all the FONT tags.
    s%</?FONT\b[^>]*>%%g;
    #
    # Find some map images in the text, tag them with names, and add
    # them to the "Go To" menu.
    s{(<img(?: \S+)* +src="map-(\d+)\.jpg"[^>]*>)}
     {<A NAME="M$2"></A>$1<META NAME="rocket-menu" CONTENT="Map $2=#M$2">\n}sig;


ENVIRONMENT

If you use an http proxy, you should set the HTTP_PROXY environment variable with the name:port information needed to access your proxy.


EXAMPLES

If you had a file named foo.html and you wanted to create a book named bar.rb from it, you could type this command:

    rbmake -o bar foo.html

If you wanted to follow the links in foo.html to other documents in the same directory (or its subdirectories), you could simply add the -fa (follow-links to any depth) option. If you'd like to also turn web paragraphs into book paragraphs and enhance the punctuation, also add the -b and -e options. Like this:

    rbmake -pefa -obar foo.html

If you liked these options and wanted to turn them into an option file, just add a -D option to the line. In this case, rbmake will not create a book, but will just dump an option file to stdout. Let's also change the input file to be a web page. Like this:

    rbmake -D -pefa -obar http://www.somewhere.com/foo.html >foo.opt

You can now use the foo.opt file instead of all the enclosed commandline options:

    rbmake -l foo.opt

It is also possible to combine an option file with commandline options and to make an option file work with user-specified arguments. For instance, the option file ws.opt (in the samples dir) will reformat any Baen Webscription HTML ebook you like, but you have to tell it what prefix to use. For instance:

    rbmake -L 0671578545 -l ws.opt

In this case, the -L option specifies a single parameter that gets passed to the ws.opt file (which refers to it via the $1 variable).


FILES

The only files that rbmake reads are the ones that you specify in the options or on the command-line.


SEE ALSO

the rbburst manpage, the rbinfo manpage, the rbdump manpage


AUTHOR

Wayne Davison <wayned@users.sourceforge.net>.