GrabUrl Functions

Overview

These functions are used to grab web pages via http. The rbmake library may call these routines internally if libwww was not compiled into the library.

You must call GrabUrl_init() with a GrabUrlGetAuthInfoFunc callback pointer if you want to be able to fetch password-protected web pages. If you don't need to do this, feel free to call this with a NULL pointer.

Use GrabUrl_read() to specify an URL to fetch and an MBuf object in which to place the resulting web page. If you specify a NULL for the MBuf object, a new one will be created for you. The return value is a pointer to the MBuf object.

Use GrabUrl_setAuthInfo() to specify an URL, a realm string (or NULL), plus a base-64-encoded "username:password" string that should be used to access this URL. The supplied data is hashed and stored in an internal array for use by the URL-fetching routines. If URL is the string "proxy" (case ignored), then the authorization information for accessing the proxy server is stored. See the rbBase64Encode() routine for an easy way to create a base-64-encoded string.

The GrabUrl_setHttpHeader() routine takes a header string and puts it into the current header info used for fetching web pages. If the string is of the form "Header: value", then "Header" is added or changed. If the string is of the form "Header:" (with nothing after the colon, not even a space), then "Header" is removed from the cached headers. The return value is 1 if a change was made, 0 if no change was needed, or -1 if the string supplied was not in the right format.

The GrabUrl_askForAuthInfo() routine takes the provided URL and a nano-http context pointer and checks to see if we already know what password goes with this realm. If not, we call the user-supplied authorization-prompting routine (if available). If we either found some new authorization info for the realm or if the user supplied some new authorization info, we return a read-only pointer to the Authorization header that was just added to the header cache (meaning that it is already set and ready to be used for the next page fetch). If we found no information or if the information was the same as the current authorization header in the cached headers, we return NULL.

This routine is automatically called when a page-fetch fails due to an authorization error. Note that the RbFetch_* routines and the GrabUrl_read() funtion share the same authorization/header info.

The GrabUrl_getHttpHeaders() routine returns a buffer that contains all the HTTP headers currently cached for the fetching of web pages. Each header is on a single line that ends with a CR + LF pair. The string includes a User-Agent header (unless it was unset by a manual call to GrabUrl_setHttpHeader()), any proxy authorization info that was set via GrabUrl_setAuthInfo(), and any per-page authorization info that was set via either GrabUrl_setAuthInfo() or GrabUrl_askForAuthInfo().

Callbacks

If you supply a GrabUrlAskForAuthInfoFunc pointer to the init routine, your function will receive the URL and the realm when we need to ask the user for a username and a password to access a web page. Your routine should pass these items plus the username/password that the user supplies to the GrabUrl_setAuthInfo() function, and return the string that it returns.