RbFile Class

Overview

To read a .rb file, you use the RbFile_open() function. It takes a filename and a flag for how you want to read in the ToC ("Table of Contents" -- the directory of "page" sections that are contained within the .rb file). The flag values are RB_OPENFLAG_INCLUDE_HIDDEN, RB_OPENFLAG_INCLUDE_IMAGES, RB_OPENFLAG_INCLUDE_AUDIO, and RB_OPENFLAG_UNJOIN. If you specify no flags (use "0"), only the info page and the html/text pages are included in the list of ToC items. (The RB_OPENFLAG_UNJOIN flag will be discussed in detail further on.)

If the open call succeeds, there will be a linked list of ToC objects that you can browse by calling "RbFile_getTocHead()" to obtain the pointer to the first object. The "info" page will always be the first entry on this list, and is guaranteed to be there. This lets you easily refer to it without searching, or to easily skip it (if you want to start your loop with the ToC_getNext() item).

The other accessors that are available are not needed very often, but here they all are:

RbFile_getFileName() returns the books' filename.
RbFile_getNewFileName will return the temporary name that we're using to construct the new .rb file if the RbFile is open for create.
RbFile_getTocCnt() returns how many ToC entries there are.
RbFile_getMaxTocCnt() returns the maximum number of ToC entries available to us when creating a .rb file.
RbFile_getFileSize() returns how big the .rb file is.
RbFile_getUnderlineCnt() returns the total of how many underline objects are attached to the various ToC items.
RbFile_getBookmarkCnt() returns the total of how many bookmark objects are attached to the various ToC items.
RbFile_getNoteCnt() returns the total of how many note objects are attached to the various ToC items.

The ToC object has the following accessors:

ToC_getName() -- the name for the page (maxinum length is RB_TOC_NAMELEN).
ToC_getType() -- this is an int value representing what kind of a page this is (and it is present only in the in-memory object). The page-type values that can be present when reading a .rb file are: RB_PAGETYPE_UNKNOWN, RB_PAGETYPE_HTML, RB_PAGETYPE_TEXT, RB_PAGETYPE_IMAGE, RB_PAGETYPE_AUDIO, RB_PAGETYPE_INFO, RB_PAGETYPE_HIDX, RB_PAGETYPE_HKEY, and RB_PAGETYPE_IGNORE. Some other values that are used internally when creating ebooks are: RB_PAGETYPE_RAW_TEXT, RB_PAGETYPE_COVER_IMAGE, RB_PAGETYPE_MAYBE_HTML, and RB_PAGETYPE_RB.
ToC_getLength() and ToC_getPos() -- the values used to find where the associated data exists in the .rb file.
ToC_getFlags() -- the ToC flags can be RB_TOCFLAG_ENCRYPTED, RB_TOCFLAG_INFOPAGE, RB_TOCFLAG_DEFLATED, RB_TOCFLAG_MENUMARK_FILE, and RB_TOCFLAG_UNJOINED_FRAGMENT. The last two are generated internally (rather than being read from the .rb file), and can be masked off by and-ing the flags with 0xFF.
ToC_getMarkupHead() -- the pointer to the list of RbMarkup items attached to this page. The list is sorted in position order.
ToC_getUnderlineCnt() -- the count of how many of the items in the markup list are underlines.
ToC_getNoteCnt() -- the count of how many of the items in the markup list are notes.
ToC_getBookmarkCnt() -- the count of how many of the items in the markup list are bookmarks.
ToC_getNext() -- the pointer to the next ToC object or NULL.

The RbMarkup object has the following accessors:

RbMarkup_getType() return what type this markup item represents: RB_MARKUP_UNDERLINE, RB_MARKUP_BOOKMARK, or RB_MARKUP_NOTE.

RbMarkup_getStart() and RbMarkup_getEnd() represent the offsets in the page that is marked up. An underline item is the only one that has a useful end value (which indicates the position of the final character in the underlined text). Note that the note object's start offset points to the letter that the note should come after, whereas the bookmark object points to the letter that the bookmark should come before.

RbMarkup_getText() returns the associated text for a markup, if available. For a bookmark, this is the associated menu text (which may have been edited by the user). For a note, this is the entire text of the note. There is no text value for an underline.

RbMarkup_getNext() returns a pointer to the next markup item.

The RbFile_create() function starts a new .rb file, and is only used by someone who really knows exactly what low-level "pages" they want to include in the .rb file (use the RbMake class for a higher-level interface for creating new .rb files). This function writes out the necessary header information and makes the file ready for data to be appended to it. If you know the exact number of ToC items that you will need to use, specify it as an argument to the function, otherwise use "0" and room will be left for the maximum number of ToC entries (temporarily -- the file will be compacted to remove any wasted space when it is done).

Use RbFile_close() to close a file open for reading or writing. If the file is being created, it will be finalized -- i.e. it will be padded with Ctrl-As (for some arcane reason), the ToC entries will be written, all the internal pointers will be updated, and it will be compacted (if needed).

Once the file is open, you can find a particular name within the ToC list by using the RbFile_find() function. It returns the ToC object for the given name or NULL if not found. Note that you won't find any entries that you didn't tell the open call you wanted to know about, so don't expect to be able to find a .hidx page unless you specified RB_OPENFLAG_INCLUDE_HIDDEN.

The RbFile_readPage() function is used to read a single entry's data from the .rb file. You supply the RbFile pointer and a pointer to the ToC entry you wish to read. If you want all the data pushed into an MBuf object, just supply it as the userPtr with a NULL for the callback function. If you want to process the data in smaller chunks, you can specify any pointer you like for the userData plus a callback function pointer, and the callback function will get called with the data chunks and your supplied pointer. The returned data is uncompressed (as needed), but otherwise unchanged from its raw form.

The RbFile_writePage() function is what is used to add a new page to a file that you are creating. You must supply the page name, the page-type (one of RB_PAGETYPE_HTML, RB_PAGETYPE_TEXT, RB_PAGETYPE_IMAGE, RB_PAGETYPE_AUDIO, RB_PAGETYPE_INFO, RB_PAGETYPE_HIDX, or RB_PAGETYPE_HKEY), the page Flags, and an MBuf object that contains the entire data for the "page" section (there is no incremental write function available). This function writes out the data (compressing it if RB_TOCFLAG_DEFLATED was specified in the page flags and it's at least 128 bytes long), and adds a new ToC object onto the linked list (which will be written out during the finalization of the .rb file).

In addition to the various higher-level functions, there are also a bunch of very low-level read/write functions that handle bytes, int16s, int32s, etc. These low-level routines are mainly just for internal use, but they are also available for people who really want to read/write the fundamental types for themselves. Look in the rbfile.h file for a list of their names.

Finally, the ToC_new() function returns a new ToC object object based on the parameters you pass in. You don't normally need to call this for yourself since the open and write functions do this for you.

Unjoin support

The RbMake routines have an option to join all the HTML pages in a book into one or more joined-html pages. To make this happen, each (formerly separate) HTML page receives a page-break and a NAME tag marking the start of each source page. Then, all existing HREF and NAME/ID references are "mangled" to refer to the appropriate parts of the joined result.

The RbFile_open() function has an option named "RB_OPENFLAG_UNJOIN" that causes a joined HTML page to be parsed to determine where all the original pages were. Then, the tocHead linked list of ToC objects is populated with objects that have the RB_TOCFLAG_UNJOINED_FRAGMENT flag set (and the original, joined HTML ToC objects are placed into a list on the tocUnjoin pointer). This allows you to browse the list of the original pages on the tocHead list.

If you call RbFile_readPage() on one of these unjoined fragments, you'll get just the data from that piece of the joined page object because the routine has been enhanced to know how to refer to the data inside the original joined page. The only thing that you'll notice different from the original pages is that the HREF and NAME/ID values are still "mangled". If you'd like to demangle them, you can use the substitution rules in the file "unjoin.subst" combined with the routines in the Subst class to do the actual work. To see this in action, look at the code in rbburst.c.

Another place that the demangling of HREF/NAME/ID attributes takes place is in the page-building HTML parsing routines inside rbhtml.c (which is used by the RbMake class). This means that it is possible to build a new .rb book using the unjoined fragments from a joined .rb file and all the links will still be intact. The rbmake.c code makes use of this feature.

Here's an example that demonstrates page joining and unjoining using the command-line tools supplied with this library. First, I create a joined .rb file like this:

rbmake -j -o joinedBook foo.html bar.html baz.html

I can turn it into a working unjoined book like this:

rbmake -u -o unjoinedBook joinedBook.rb

Or I can "burst" it into its component pieces (with the links properly demanged):

rbburst -u joinedBook.rb