| Web Techniques May 1997 |
Despite all the hoopla over new database-connectivity solutions such as JDBC, ODBC, 3-tier, and RAD, alternative, low-cost options are available. Most of the press coverage is concentrating on "real-time connectivity," where one database services both internal needs (such as employees entering inventory) and external functions (like Web-site visitors checking price and availability). "Offline database connectivity," on the other hand uses a conventional database to store and manipulate internal information; Web-site visitors extract information from a copy of the database, typically Web pages reflecting changes of an hour or a day ago.
Because real-time connectivity is still complicated and expensive, effective use of banal, offline database connectivity has merit. This is what we discovered when we built our site, Good Time Net (www.goodtime.net), a recreational information directory for Orange County, CA. To meet the needs of local residents interested in things to do near where they lived or worked, Good Time Net offers pages for recreational events and destinations--3000 pages initially, with a goal of up to 50,000. Given the schedule changes common to recreational events, we knew we would have a maintenance problem. The requirement for database connectivity was self evident.
Firewall. To ensure the integrity of our internal database, we had to use a firewall. This carries a monetary expense, plus a systems-integration cost. Furthermore, firewalls can err, hindering a legitimate request or allowing an illegitimate one. There is usually a way to fix the problem, but you incur cost, delays, aggravation, and risk. The alternative was to present Good Time Net's information as a day old, for example, allowing our internal database to publish the Web pages and update via FTP with no firewall or security precautions
Graceful degradation. We expected peak usage (after school or approaching a weekend) of more than a thousand concurrent users. Servicing that many real-time CGI or API calls interfacing to a database multiplied our hardware needs. Good Time Net's events and destinations are stored on Web pages under 3KB each; graphics such as maps and photos under 30KB each. Overworked Web servers degrade gracefully in parceling out such Web pages--users get at least something to look at. Real-time calls to a database, on the other hand, complete in series--queries are normally executed first-in first-out. The submit button forms an SQL query from the user's entries; this is passed to the database, which produces a report, which is converted into an HTML document for viewing. You can't display a partial result of this to waiting users.
This is critical to Web developers because browsers time out if they receive nothing back in 120 seconds. This has to do with the nature of packet transmission across stateless networks (nothing is tracking requests). The result of a timeout is a nonspecific error message that users interpret as a defective Web site. If a Web site-to-database-to-Web site trip takes one second for the average query, the 121st concurrent user will fail without a graceful degradation. If they hit the submit button again and join with other failed users, the combinatorial explosion will paralyze the system, which will not resume normal functioning until many users grow frustrated enough to leave. None of us wanted that as an operational design parameter. The only solution was to mirror the Web site, the database, and the database's Web site interface, which was the process bottleneck.
In the offline process, Web pages are immediately available for viewing. Even on a dusty old Pentium, the system could service thousands of concurrent users with the built-in capability of all Web servers to parcel out pieces of a Web page to queued users, thus avoiding the 120-second timeout. Mirroring that in response to performance degradation requires copies of the Web server, not the database that produces its content.
As the size of the database grows, real-time response deteriorates. In offline DB-Web connections, you simply break the prepackaged lists into smaller Web pages as necessary. For example, break a large alphabetized list into A-J and a K-Z, and everything else works the same; you just have one more page. It seemed everywhere we looked, the dynamic approach had throughput walls while the offline approach offered graceful degradation.
Throughput. It will always take longer to produce a Web page on the fly and then transmit it to the user than to just transmit it from storage. This implies an additional cost of storage to the offline approach, which requires more prepackaged Web pages (query results or alternate navigational tools) to be ready to transmit, but the cost of hard drives is less than $.20 per MB and falling, whereas increasing real-time processing horsepower costs $10,000 per Web server.
Obsolescence. Every company with a database product or a programming language is racing to release products to address the Web connectivity market. (See the September '96 issue of Web Techniques for more details on the plethora of options.) Because the market shakeout will probably yield no more than a dozen survivors, our odds of picking the right horse seemed little better than one in 10. We almost chose Oracle's SQL tools, but decided to postpone choosing a real-time database for several months. So, we built the Good Time Net using a familiar database that could easily handle an offline approach (which involves simple text-file printing with tags as ASCII characters).
Reliability. The field of software tools for real-time DB-Web connectivity is crowded with alphas, betas, and an occasional version 1.0. The trade journals showed comparison tables of features and functions but never a reliability index or a tested compatibility list. We decided to keep evaluating new software, but not to go boldly where no one has gone before.
Fast prototyping. The development methodology shifted from delivering the killer app to one of rapidly prototyping the intermediate solution. We opted for the quick and dirty, and went with a little-known database called VersaForm XL because it supported rapid prototyping (a simple programming language, and a form-based approach to building relations between tables). It had the essential features, and because it was a DOS-based tool (that ran fine under Windows), its publisher had stopped adding new features years ago. It's programming language was Pascal, which even management knew, and no one could recall it crashing in years. The point is not that you should go out and get VersaForm to do offline DB-Web development; but rather that you can use almost any relational database for this job, so use the one you're most familiar with. Demonstrate the key concepts in days, not months, and go back to the drawing board often to implement feedback.
Web directories. One advantage to offline DB-Web connectivity didn't occur to us during our deliberations, but I can include it in hindsight. Web pages produced by a real-time connection to the Webserver are available to the user requesting them, but not to the robot of a Web site directory service such as Alta Vista or Web crawler. If we prepare that page offline, it is available to such a robot and may be catalogued in its directory, thus increasing our traffic.
Budget. The Good Time Net began as a community resource teamed with Orange County's Harbors Beaches & Parks, as well as some local city governments, membership organizations, and nonprofits. These are not rich outfits, and in fact, all funding came from our company's IR&D and goodwill accounts, so pinching pennies was important.
An environment to provide real-time DB-Web interface costs between $1000 and $5000 just for the software tools--this in a world where comprehensive Dos/Windows database development environments are under $500. It would not be fair to allocate all the learning and debugging costs of mastering the new software to this project, since some of that would be applicable elsewhere, but that still meant agreeing to sign paychecks for an additional $10,000. Add to that the need for firewalls and additional Webserver processing power, and the hardware/software tab came in at around $25,000. Compare this with Web hosting costs of under $100/month, which would be our only Web-related expense using offline DB-Web connectivity.
Our plan required designing all the navigation paths ourselves, publishing them as Web pages, and then FTP-ing them to the Webserver. Printing to file a Web page for each recreational destination or event was easy--thousands of records printing thousands of corresponding files with HTML tags--but the path to get the user to each page took thought.
Let's take a page describing a Calafia Bicycle Tour starting in San Clemente. The database would step through each record to produce a list of records with the keyword equal to bicycling and the location equal to San Clemente. This "Bicycling in San Clemente" list gathers perhaps a dozen records, arranges them in lines, and adds an HREF link to each corresponding Web page, including the one for the Calafia Bicycle Tour. Or, you could produce a master list for San Clemente listing the active keywords for San Clemente, with the bicycling keyword linking to the previous Bicycling in San Clemente page. Lastly, we can generate a list (or even an image map) of all active cities, with a link to the master list. The user would pick a city from a list, then a keyword from the next list, and then an event or destination from the last list. We had to make compromises, for example, to give users an age range to match events and destinations to their kids. Instead of keying in a number or range of numbers, users had to choose preschoolers, children, teens, or seniors.
We were able to add cross indexes, however, so that events and destinations were listed on different lists, including lists of the sponsoring organizations. As one might guess, the page that described the event was published with links to the destination and the sponsoring organization's home page, as well as custom pages with comments and notes. If the data exists in the database, the published Web page could contain itÑas long as the linked pages all exist on the Web server
The Good Time Net was officially launched in September '96 after 770 person-hours and $38,500. Most of that was labor and included database development beyond the scope of this article. Relations were established between events and locations so that users can click to the corresponding location from an event. An online event submittal form was also established; it posts to the database. (In an offline batch process, that automatically reads the form responses into database records, then publishes the corresponding Web pages and updated cross-index lists.)
The Good Time Net is keeping its maintenance costs below $500/month. Most of this goes to data entry and management, but that procedure reflects one of the principal reasons to select a DB-Web interface for information delivery Web sites. We change the date of an event on its database record, for example, and once a day the database publishes all updated Web pages and their many cross-index lists with one command. Because we have kept development and maintenance costs so low, we now sell ad space to companies offering recreational products or services in Orange County for $10/month. With comparatively few listings at such a low cost, we can more than cover costs and fulfill our mission of providing a community resource without the fiscally risky proposition of relying only on our goodwill account.
| URL Resources |
| www.goodtime.net ls6-www.informatik.uni-dortmund.de/ir/projects/freeWAIS-sf |