The magazine of the Melbourne PC User Group

SGML/XML/XHTML resources - Part 3
Major Keary
majkeary@netspace.com.au

Hypertext

Hypertext has been around for a long time; the term was coined by Ted Nelson in 1965, but the concept dates back to 1945 when Vannevar Bush proposed a device, which he called Memex, "that would emulate a mind in its associative linking of items of information and their retrieval" (Encyclopedia of Computer Science, 3rd edn.). The first widely available hypertext application was HyperCard, introduced by OWL in 1986 and released by Apple in 1987.

As a matter of interest, as far back as 1988 a hypertext-based multimedia interactive game, Manhole, was published on CD for the MAC.

Often overlooked is that hypertext was intended to enable users to move from one part of a document to another by means of links. When the Web was conceived the linking facility was between - rather than within - documents in order to provide access to files on remote sites.

Graphical browser technology brought with it a demand for extended control over format, a development that overshadowed the main purpose of the hypertext markup language: to provide links between documents. Emphasis on good looks at the expense of structure has given rise to a new problem: browser vendors have vied with each other in the introduction of non-standard enhancements. The result is that a given HTML file may display inconsistently on different browsers (as well as on versions of the same browser).

The solution is XML, which focuses on document structure. Of special importance is that XML offers a standard for the representation of data on the Web by providing a simple syntax that can be read both by people and by machines.

Electronic Data Exchange (EDI) over the Internet requires two conditions: security, and an open, non-proprietary method of EDI data encoding. Security can be provided by virtual private network (VPN) protocols, and XML can provide a solution to the encoding prerequisite.

XML and Data On the Web

Data on the Web, a professional title from Morgan Kaufmann, is an important text about the application of XML to the representation, retrieval, and processing of information.

It is not a book for ordinary Web users, database users, or even Web site designers whose main concern is appearance. However, anyone with an interest in database fundamentals should find the introduction interesting and informative. Some chapters are straightforward discussions of XML programming, but much of the text assumes an understanding of programming and a knowledge of predicate calculus and group algebras.

The book is designed for three audiences; the first is primarily developers and researchers involved in the design and development of tools for the management of data on the Web; the second is those teaching or learning data management; and the third is system managers with a responsibility for publishing data on the Web. As already mentioned, some chapters are straightforward XML; they don't require any special knowledge other than of XML programming. The main part of the book is in four parts:

Data Model: Much of the data on the Web is semistructured, lacking any description of the type or structure of the data. Other data is in the traditional relational-database or object-database format. The authors discuss the ways in which XML can be applied across those data types. I was taken by the description of AVeDB (A C. elegans Database), which was developed for the genetics of a small worm. It (the database, not the worm) happens to have a model and format useful to much wider purposes.

Queries: Discusses query methodology and languages. A chapter that will be of interest to XML programmers deals in depth with query languages for XML, and includes discussion of XSL (stylesheet language) and XML-QL (query language).

Types: This part, which is highly technical, puts forward some foundations for types in semistructured data.

Systems: Discusses query processing, particularly in respect of semistructured data, and describes two systems, Lore and Strudel. Lore (Lightweight Object REpository) was developed at Stanford University to manage semistructured data. Strudel, a Web site manager, is one of AT&T's research projects and can be downloaded; just search for "strudel" and pick the AT&T site-the rest are to do with edible strudel. A fair bit of documentation is also available. One of the things Strudel does is manage semistructured data, and it has its own query language, StruQL. An interesting HTML-specific application.

For anyone working, teaching, or studying at the high end of data management and retrieval this is an important contribution to the literature.

Abiteboul, Buneman, and Suciu: Data on the Web
ISBN 1-55860-622-X
Published by Morgan Kaufmann,
257 pp. hc, RRP $74.95


XML and Java for the Web

The full title of this book is XML and Java: Developing Web Applications and it is one of several high level texts on XML that have come out of Japan. In this case, the lead author is an Associate Professor of Computer Science and Manager of Network Applications at IBM's Tokyo Research Laboratory. Readers are assumed to be Java-literate, know something about XML, and to have some experience in writing Web applications. It contains a lot of code listings (all on a companion CD) and the text is technical. What the book does deliver is a thorough description of how to interface Java and XML for Web applications. A number of real-world sample programs are used to illustrate the process and issues. One, LMX, shows how two or more entities can share access to their respective databases. The capacity to query databases is essential to commercial applications, and there is a detailed discussion of interfacing databases. A sample program, SQLX (SQ embedded in XML) shows how this is done. The most complex application used to illustrate techniques for combining XML with Java is a travel planning application that enables access to several sites (airline bookings, hotel accommodation, and so on) and returns an itinerary. It incorporates the beans that have already been described in the first part of the text. Appendices contain extensive URL lists and other resources. A thorough, technical treatment of the subject for developers.

Maruyama et. al: XML and Java - Developing Web Applications
ISBN 0-201-48543-5
Published by Addison-Wesley,
386 pp. + CD, RRP $59.95

Cascading Style Sheets (CSS)

Eric Meyer, author of a new O'Reilly title, Definitive Guide to Cascading Style Sheets, says, "Cascading Style Sheets is a standard way to separate a document' structure from its presentation [which may] sound very abstract, but the benefits are quite surprising and profound". CSS enables a "centralised description of web document appearance", but its strengths and weaknesses need to be understood.

There are two standards, CSS1 and CSS2; even though CSS2 is a W3C Recommendation, for all practical purposes it is of no current practical use because available browsers lack implementation. The author takes the practical view that discussing CSS2 is a waste of time until there is effective, widespread browser support.

Eric Meyer, a well-known author and acknowledged HTML and CSS expert, presents a warts-and-all account of CSS as it is supported by current browsers. There is even a chart that shows which browsers on Win95 and Mac platforms support which features. As the support is built into the browser, one can assume that what suits Win95 also suits Win98 and Win2000.

Whether you are a CSS novice, an expert, or somewhere in between, this is a really useful guide, tutorial, and ongoing reference for the implementation of cascading style sheets and their piratical application. It is unusual to find a text that genuinely provides-without being a mountainous volume-for all levels of user, but this one does just that. A reasonable grasp of creating HTML documents is assumed, but does not have to be much more than being able to get "Hello World!" to display properly on one's browser.

There is not room here to dilate upon the advantages of CSS, but suffice to say that HTML and XHTML are steadily deprecating format tags. In plain language that means future versions of browsers will not recognise them, but-just as in SGML-presentation features will be handled by style sheets. It is interesting to note that the Wireless Access Protocol (WAP) uses CSS to display content on wireless devices.

HTML users have to face the fact that CSS will not be an optional add-on, but a necessary tool for web authors.

There are other CSS titles, but this is the benchmark for those who want a complete guide to real world CSS. Good and better things may be just over the horizon, but practical users need to know about today's tools. This is where they should start-and for most it will be all they will ever need.

Eric Meyer: Cascading Style Sheets - The Definitive Guide
ISBN 1-56592-622-6
Published by O'Reilly,
453 pp., RRP $69.95

Strudel?

Strudel is a nickname for the 'commercial at' sign, @, which is the likely reason for its use in the name of a Web-related application, "Commercial at' is the official ISO and HTML designation, but @ known by many names. In Italian, chiocciola (snail); in Flemish, apestaart (monkey tail), in French, arobase, and in Israeli, shtrudel. According to the Encyclopaedia of Graphics Communications it is known in the American printing industry as cinnamon bun.

The sign, @, is most likely a combination of two letters (just as '&' is a ligature of 'et'). The Latin word ad, (at) appears to have been converted to a single symbol by writing the a and then curling the stem of the d, dropping its bowl. It has been in typographic use for centuries. For an interesting account, go to http://www.art-bin.com/art/asignoftimes.html.

Reprinted from the July 2000 issue of PC Update, the magazine of Melbourne PC User Group, Australia