rbmake - a book-maker for .rb files
rbmake [OPTIONS] [FILE/URL] [...]
The rbmake utility can be used to make your own Rocket-eBook-compatible
ebooks (.rb files) from HTML/text files found locally or via the web.
Rbmake produces better results than the RocketWriter software, and
provides you with more (and more powerful) options.
Here are some feature highlights:
See EXAMPLES for some simple examples of how to use rbmake.
The following options may be specified on the commandline. See the
option-file section for a way to specify a slightly more powerful set of
options.
- -a AUTHOR
- Specify the book's author. For example:
rbmake -a 'Tolstoy, Leo' wp.html
- -b
- Translate paragraph tags into book-style paragraphs (with an indented
first line and no preceding empty line). Repeating this option increases
the ``depth'' (in the HTML) at which the book paragraphs are expected to
occur. For instance, if the P tags are enclosed in a DIV (rather than
directly inside the BODY), you'd need to specify an extra -b to indicate
that the P tags are an extra level deep.
- -c FILE/URL
- Set the cover image. The specified FILE/URL will be the first thing
placed in the first page of HTML. This can be used even without
specifying -i.
- -D
- Dump an option file to stdout and quit without doing any other work.
You'll get the default options if you specify nothing other than -D.
- -e
- Enhance the punctuation using improved quotes, dashes, and ellipses.
Double-quotes are translated into opening and closing double-quotes by
alternating between them within a single paragraph (allowing for
unclosed paragraph quotes). Backticks are translated into opening
single-quotes. Apostrophes are translated into closing single-quotes.
Two dashes in a row are translated into an em-dash. Three periods in
a row (possibly separated by normal whitespace, but not by
non-breaking spaces) are translated into an ellipse character.
- -E #
- Edge-enhance images: none or 1 - 9 (9 = lots, default is 7).
- -f no|yes|any|DEPTH
- Follow links in the HTML documents to find new ones to include. If
the value is a number, we follow links up to DEPTH links away from a
user-specified page. Otherwise specify 'y' (or a) for unlimited
depth, or 'n' for no following.
- By default, only links that are share a common path with the starting
pages are followed. Use the -m/-M options to change this.
- -g DESCRIPTION=URL
- Add a menu-item to the ReB's ``Go To'' menu.
- -h
- Print this help message.
- -i
- Include images and audio (.wav) files.
- -I FILE
- Specify a .info file to use as the basis for creating the new info page.
If you specify '-', rbmake will read from stdin. If you specify '.', the
first ARG will be used to find the info information. You can specify the
name of a .rb file here if you want the info page read from there.
- -j
- Join all the pages into a single, unified HTML page.
- -J #
- Join every # pages together into a unified HTML page.
- -k
- Generate a .hkey (dictionary index) for the root .html page.
- -l FILE/URL
- Load the option file from the indicated FILE/URL. See -D for how to
easily create an option file.
- -L ARG
- Specify an ARG for an option file. E.g.:
rbmake -L arg1 -L arg2 -l foo.opt
- -m URL-PREFIX
- Add the URL-PREFIX to the match-list that determines which links we
follow. Maybe be repeated as many times as needed. Implies -f.
- -M URL-PREFIX
- This works just like -m, plus it indicates that the -m/-M options
are the only match-items that should be used (normally each ARG item adds
its path to the match-list).
- -n
- Non-interactive mode (e.g. avoid username/password prompting).
- -N #
- Specify the maximum # of pages that we should fetch.
- -o NAME
- Specify the output NAME of the .rb file (default: the URL-name).
If you specify '.', the name portion of the first ARG will be used.
- -O
- The output name for the .rb file will be the name from the -I option.
- -p
- Prompt for the title and author information (not affected by -n).
- -P
- Prompts for a username & password and then outputs the result base-64
encoded. This is useful if you want to slightly obscure some information
for the Auth-Info option in an option file.
- -R
- Rewrite a .rb file (which must be the first argument). Using
-R foo.rb
is short for -OI foo.rb foo.rb
. Note: images are stripped without -i.
- -s FILE/URL
- Read text-substitution rules from the indicated FILE/URL.
Use '-' for stdin.
- -t TITLE
- Specify the book's TITLE. You'll probably need to quote the string. For
example:
rbmake -t 'War and Peace' wp.html
- -T n|p|s
- Specify the .txt translation mode. 'p' is preformatted (the default),
's' is simple paragraph translation (i.e. 2 newlines indicate a new
paragraph), 'n' is no translation (only used when your rewriting rules
are handling the translation).
- -u
- Unjoin the pages from inside an rbmake-joined .rb file.
- -U URL-NAME
- Specify the value of the (.info) URL-NAME. The default is a unique
value.
- -v
- Output verbosely about problems found in the HTML.
- -V
- Output the version of rbmake.
- -w
- ARG names default to web pages (http://) rather than files (file:$CWD).
- -W
- Write the created .info page using the .rb file's name & a .info suffix.
- -x MATCH
- Exclude the matching URLs from being included (both links and images).
May be repeated as many times as needed. The match strings are wildcard
rules (not regular expressions).
- -z
- Allow <HR SIZE=0> to specify a page break (in addition to <HR NEW-PAGE>).
These options may be specified in an option file (see the -l
commandline option for how to specify what option file to read) and the
-L option for how to specify optional parameters for your option file.
- Accept-URLs-Matching: URL-PREFIX
- Add the URL-PREFIX to the match-list that determines which links we
follow. Maybe be repeated as many times as needed. Implies
Follow-Links.
- Allow-Old-Style-Page-Breaks: yes|no
- Allow <HR SIZE=0> to specify a page break (in addition to <HR NEW-PAGE>).
- Auth-Info: AUTHORIZATION-INFORMATION
- If you want to put the username/password information for accessing your
web pages into the option file, you can do so with the Auth-Info option.
The format of the data is:
-
http://www.site.com/path/|Realm Info|username:password
- You can leave the ``Realm Info'' empty if you supply the /path/ info in the
URL and conversely, you can leave off the /path/ if you supply the ``Realm
Info''. The ``username:password'' data my be supplied base-64 encoded, which
gives you some minimal obscurity if someone looks over your shoulder when
you're editing the file. See the -P option for a simple way to do
this.
- If you need to specify the username/password for your proxy, you'd do so
like this:
-
Auth-Info: proxy||username:password
- or
-
Auth-Info: proxy||dXNlcm5hbWU6cGFzc3dvcmQ=
- Auto-Accept-Input-File-Dirs: yes|no
- The ``yes'' default means that every URL specified via Input-File
automatically adds its directory (or URL path) to the accepted list of
allowable files/URLs. Setting this to ``no'' means that you have to specify
them all manually via Accept-URLs-Matching.
- Book-Filename: NAME
- Specify the output NAME of the .rb file (default: the URL-name).
If you specify '.', the name portion of the first ARG will be used.
If you specify the string
_Put_Title_Here_
, rbmake will use the book's
title to name the book. This naming occurs after all the normal
title-determining steps have occurred, including reading the first HTML
file and possibly prompting you to verify the title (and author).
- Cover-Image: FILE/URL
- Set the cover image. The specified FILE/URL will be the first thing
placed in the first page of HTML. The image will also be resized to be as
large as possible in the ReB's portrait orientation. This option can be
used even without specifying the Include-Images option.
- Enhance-Punctuation: yes|no|[``'-.]
- Enhance the punctuation using improved quotes, dashes, and ellipses.
Double-quotes are translated into opening and closing double-quotes by
alternating between them within a single paragraph (allowing for
unclosed paragraph quotes). Backticks are translated into opening
single-quotes. Apostrophes are translated into closing single-quotes.
Two dashes in a row are translated into an em-dash. Three periods in
a row (possibly separated by normal whitespace, but not by
non-breaking spaces) are translated into an ellipse character.
- If you only want some enhancement performed, list the items you want
enhanced instead of saying ``yes'' (i.e. specify
"
for double-quotes,
'
for single-quotes, -
for em-dashes, and .
for ellipses).
- Exclude-URLs-Matching: MATCH
- Exclude the matching URLs from being included (both links and images).
May be repeated as many times as needed. The match strings are wildcard
rules (not regular expressions).
- Follow-Links: no|yes|any|DEPTH
- Follow links in the HTML documents to find new ones to include. If
the value is a number, we follow links up to DEPTH links away from a
user-specified page. Otherwise specify
yes
(or any
) for unlimited
depth, or no
for no following.
- HTTP-Header: HEADER: VALUE
- Specify an HTTP header that should be included when sending HTTP requests
to web servers. For example:
-
HTTP-Header: User-Agent: Mozilla/4.0 (compatible ...)
- Image-Edge-Enhancement: none|1-9
=item Image-Edge-Enhancement: none|1-9=MATCH
- Edge-enhance images: none or 1 - 9 (9 is the most; default is 7).
Specifying this without a =MATCH string will set the default value.
Repeat this option (with various MATCH strings) as many times as needed.
The match strings are wildcard rules (not regular expressions).
- Import-Info-From: FILE/URL
- Specify a .info file to use as the basis for creating the new info page.
If you specify '-', rbmake will read from stdin. If you specify '.', the
name of the first Input-File will be used to find the info information
(if it is a .rb file, the info will be read from there, otherwise the
file's suffix will be replaced with .info and the info will be read from
that file). You can specify the name of a .rb file here if you want the
info page read from there.
- Include-Audio: no|yes|match
- Include audio (.wav) files. Specifying ``yes'' means that the audio's path
must only avoid matching one of the Exclude-URLs-Matching wildcards.
Specifying ``match'' means that the path must also match one of the
Accept-URLs-Matching wildcards.
- Include-Images: no|yes|match
- Include image (GIF/JPEG/PNG) files. Specifying ``yes'' means that the
image's path must only avoid matching one of the Exclude-URLs-Matching
wildcards. Specifying ``match'' means that the path must also match
one of the Accept-URLs-Matching wildcards.
- Input-File: FILE/URL
- The indicated FILE/URL will be included in the book. If
Follow-Links and Auto-Accept-Input-File-Dirs is also on, then the
page's path is also added to the list of allowed links.
- Input-Files-Default-To-Web-Pages: yes|no
- An unqualified Input-File will be interpreted as an URL rather than
being treated as a path name. In other words, by default ambiguous.net
would be treated as a filename, but with this option turned on, it is
treated like a web page. No matter what you can always force the right
meaning by prefixing a web page with http: and prefixing a file with
file:.
- Make-Dictionary-Index: yes|no
- Generate a .hkey (dictionary index) for the root .html page.
- Menu-Item: DESCRIPTION=URL
- Add a menu-item to the ReB's ``Go To'' menu.
- Non-Interactive: yes|no
- Avoid any unrequested prompts (i.e. username/password prompting).
- Page-Joining: none|all|COUNT
- If ``all'' is specified, every HTML/text page is joined into a single page.
If the value is a number, a maximum of COUNT pages in a row will be
joined together (which is sometimes useful to avoid creating a single page
that is too large for the ReB to handle).
- Prompt-For-Book-Info: yes|no
- Ask the user what the Title and Author should be. The prompt occurs after
we read the first page, so the default answer is the title/author info we
were able to figure out from there (unless specified via option).
- Set-Info: NAME=VALUE
- Set one of the NAME=VALUE pairs that get put into the new info page.
Valid NAMEs include
TITLE
, AUTHOR
, GENRE
, ISBN
, URL
,
etc.
- Substitution-Rule-File: FILE/URL
- Read text-substitution rules from the indicated FILE/URL.
Use '-' for stdin OR in combination with a dashed divider line to indicate
that the rest of the option file should be read as the substitution rules.
- Text-Conversion: none|preformatted|simple-para
- Specify the .txt translation mode. The default is preformatted (which
works like putting the text inside an HTML <PRE> section), simple-para
turns an empty line into a paragraph break), and ``none'' specifies that we
should do no translation (and should only be used when you have supplied
your own rewriting rules via the Substitution-Rule-File).
- Use-Book-Paragraphs: yes|no
- Translate paragraph tags into book-style paragraphs (with an indented
first line and no preceding empty line).
- Unjoin-Rb-Files: yes|no
- Unjoin the pages from inside an rbmake-joined .rb file.
- Verbose-Output: yes|no
- Output verbosely about problems found in the HTML.
The Substitution-Rule-File has the following syntax:
You specify one or more URL-matching specs by either specifying a
wildcard-matching spec (using double-quotes) or a
regular-expression-matching spec (using '/' or ``mX'', where 'X' can be any
non-space character). You must terminate the matching spec with a colon,
and then follow the last one with a set of substitution rules inside
curley braces.
One (unlikely) example matching spec:
"*.txt":
/foo\.htm$/:
m%/bar\.txt$%:
{
SUBSTITUTION-RULES-GO-HERE;
}
Note that wildcard matches are anchored (i.e. you'd need to use ``*foo*''
to match ``foo'' anywhere), and that regular expressions have to be manually
anchored (using '^' and/or '$', as needed).
Substitution rules look very much like those that you'd find in perl due
to the use of the PCRE (Perl-Compatible Regular Expressions) library and
some custom parsing code. There is currently no variable expansion in the
matching side of the rules and no support (yet) for some of the character
rewriting rules (such as \U, \L, \Q, and \E). There is also no support
for any programmatical branching at the moment.
Some substitution rules:
# Put common prefixes (The, A, An) at the end of the title.
s%(<title>)The (.*?)(?=</title>)%$1$2, The%i;
s%(<title>)(An?) (.*?)(?=</title>)%$1$3, $2%i;
#
# Get rid of all the FONT tags.
s%</?FONT\b[^>]*>%%g;
#
# Find some map images in the text, tag them with names, and add
# them to the "Go To" menu.
s{(<img(?: \S+)* +src="map-(\d+)\.jpg"[^>]*>)}
{<A NAME="M$2"></A>$1<META NAME="rocket-menu" CONTENT="Map $2=#M$2">\n}sig;
If you use an http proxy, you should set the HTTP_PROXY environment
variable with the name:port information needed to access your proxy.
If you had a file named foo.html and you wanted to create a book named
bar.rb from it, you could type this command:
rbmake -o bar foo.html
If you wanted to follow the links in foo.html to other documents in the
same directory (or its subdirectories), you could simply add the -fa
(follow-links to any depth) option. If you'd like to also turn web
paragraphs into book paragraphs and enhance the punctuation, also add the
-b and -e options. Like this:
rbmake -pefa -obar foo.html
If you liked these options and wanted to turn them into an option file,
just add a -D option to the line. In this case, rbmake will not create
a book, but will just dump an option file to stdout. Let's also change
the input file to be a web page. Like this:
rbmake -D -pefa -obar http://www.somewhere.com/foo.html >foo.opt
You can now use the foo.opt file instead of all the enclosed commandline
options:
rbmake -l foo.opt
It is also possible to combine an option file with commandline options and
to make an option file work with user-specified arguments. For instance,
the option file ws.opt (in the samples dir) will reformat any Baen
Webscription HTML ebook you like, but you have to tell it what prefix to
use. For instance:
rbmake -L 0671578545 -l ws.opt
In this case, the -L option specifies a single parameter that gets
passed to the ws.opt file (which refers to it via the $1 variable).
The only files that rbmake reads are the ones that you specify in the
options or on the command-line.
the rbburst manpage, the rbinfo manpage, the rbdump manpage
Wayne Davison <wayned@users.sourceforge.net>.