The magazine of the Melbourne PC User Group

Finding It On the Net - for the bookshelf
Major Keary

 

 
Many — and, I suspect, a rapidly growing number of — users do not distinguish between the Internet and the World Wide Web. Indeed, they may not even be aware that the term, Web, is an abbreviation. The Internet is no longer a marble pillar'd library in what the ignorati call cyberspace. It is a dump, mostly of garbage, but in which precious gems of knowledge — and even wisdom — may be found. One of the great inventions of our age has been the search engine that enables those who seek porn to find it, and those who seek pearls to find them without having to rummage through an overlay of rubbish.

Searching

The dictionary meaning of 'to search' is: "To go about (a country or place) in order to find, or to ascertain the presence or absence of some person or thing; to explore in quest of some object" [OED); in ordinary parlance, to look for something.

All mammals look for food, shelter, and sex; those that fail in their search don't survive for long enough to pass on their genes. Many exhibit a sense of curiosity that leads them to search for things that don't meet their immediate need for survival; just as dogs go fossicking and dig holes in search of who knows what, many humans also fossick on the Internet just for the sake of it.

Early humans learned how to store data as pictures and symbols, and later as written records, that gave them the ability to search for and to communicate knowledge. The development of sophisticated writing systems enabled more formal records to be made, and as the corpus of recorded information expanded so did the means of storing it in order to find particular pieces of it on demand.

Librarians still form what is arguably the most efficient class of 'finders'; many people may search, but a librarian is most likely to find. Even computer-based information retrieval systems rely on human design and human input; a skilled searcher will get faster and better results than one who doesn't understand the search process.

The Internet is perceived by some as a quantum leap in the way we search for information. It was not a single startling scientific discovery; the Internet was built on the ideas of, and work done by, many people. Paul Baran and Donald Davies independently invented packet switching, without which the system would not work. Vannevar Bush conceived what Doug Engelbart later developed and Ted Nelson named hypertext, an essential ingredient in Tim Berners-Lee's World Wide Web.

Modern search engines are a logical extension of what has gone before and are a response to the huge — and growing — base of publicly available material. Their respective developers may not say so, but complex search engines are also a necessary response to the mountain of garbage that litters the Internet. The art is to make an intelligent guess of what users are looking for, to filter out the garbage, and to present meaningful search results in some order of relevance. An added complexity is that many users are less than competent when it comes to searching.

The other side of the search coin is findability, a recently coined noun derived, logically, from findable. If a document, in the widest sense of the term, is not adequately labelled or described it is less likely to be found. Authors and Web publishers need to understand how search engines operate in order to make their work findable.

Ambient Findability

Finding anything — one's way, where one left the car keys, or information — involves human behaviour, a factor that I can't recollect having seen discussed in books about Web search engines; that is, with the exception of Ambient Findability.

The author, Peter Morville, ranks high amongst information architects and is a co-author of what I consider to be the best single text on information architecture (IA): Information Architecture for the World Wide Web. In Ambient Findability he turns his attention to 'findability', taking the reader on a fascinating journey in which search-related technologies are shown in a new light.

This is a book of great insights; the author does not engage in a pedagogical discourse — the law according to Morville — but puts forward his views and thoughts for consideration. This is a text for people with a sense of curiosity, especially those who want to be intelligently informed on the subject of information retrieval and making information more readily found — especially on the Web. It is both philosophical and technical, ranging over many topics including: information literacy; information interaction; human wayfinding; location-sensing devices; maps and charts; artificial intelligence; push and pull; search engine marketing, and defining the terms 'information', 'data', and 'knowledge'.

Even though not fully satisfied with the book's 'working definitions' I was pleased to see someone discuss the subject so lucidly. The Oxford English Dictionary devotes over 5800 words to 'information'; its meaning varies with context, information can be —and often is — false, and politicians often claim to "have no information". When Claude Shannon coined (1948) the term, information theory, 'information' meant something quite different from its use in information science, information systems, or the amorphous information technology.

There is a specific computer-context meaning for data:
"The quantities, characters, or symbols on which operations are performed by computers and other automatic equipment, and which may be stored or transmitted in the form of electrical signals, records on magnetic tape or punched cards, etc." ]OED].

If you want to pick up some superior words, read this book. That's not intended as a criticism, or a sly dig at the author's writing; he is very careful with his words and does not leave his readers to the mercies of a dictionary. Anyone with an interest in writing should study Peter Morville's book; it is among the best examples I have seen of technical communication. There is no 'dumbing-down', or glossing over difficult concepts, and he manages to maintain a conversational style that is most engaging. It is a great read. The last chapter contains a fascinating discussion of artificial Intelligence.

For professionals in the fields of Web design, information architecture, or search engine marketing it is a remarkable source of insights, and of material — especially quotes — for use in presentations and proposals to clients and management.

A text worth a place on reading lists r just about any tertiary course.
 
Peter Morville: Ambient Findability
ISBN 0-596-00765-5
Published by O'Reilly,
188 pp.
$55.00 incl. GST

Information Architecture

If documents — to use that term in its widest sense — are not properly labelled and/or their content is not adequately organised, then search engines are not likely to find them as readily as documents that conform to the principles of information architecture (IA).

The term, information architecture, was coined some twenty years ago by Richard Wurman, an architect who took an interest in methods of collecting, organising, and presenting information of interest to architects and urban planners. For an interesting lecture on Wurman and the origins of information architecture go to:
http://www.gslis.utexas.edu/~I38613dw/readings/InfoArchitecture.html

On the dust jacket of Wurman and Bradford, eds.: Information Architects (Graphis Press, 1996, ISBN 3-85709-458-3), 'information architect' is defined thus:

  1. the individual who organizes the patterns inherent in data, making the complex clear.
  2. a person who creates the structure or map of information which allows others to find their personal paths to knowledge.
  3. the emerging 21st century professional occupation addressing the needs of the age focused upon clarity, human understanding, and the science of the organization of information."
The authors of Information Architecture for the World Wide Web — discussed below — define IA as:
  1. The combination of organisation, labelling, and navigation schemes within an information system.
  2. The structural design of an information space to facilitate task completion and intuitive access to content.
  3. The art and science of structuring and classifying we sites and intranets to help people find and manage information.
  4. An emerging discipline and community of practice focused on bringing principles of design and architecture to the digital landscape.
For we ordinary folk a less formal definition is:

"Information architecture is the science of figuring out what you want your site to do and then constructing a blueprint before you dive in and put the thing together," http://www.webmonkey.wired,com.

Information Architecture for the World Wide Web

Over the years a number of texts have been published on IA, amongst which is Information Architecture for the World Wide Web. The second edition was published in 2002, is still in print, and is fully relevant. As the definitions above indicate, IA is not a technology in the usual sense of the word; there is not a new version or release every Saturday night.

Information Architecture for the World Wide Web is not designed for casual Web authors who create occasional one-off static pages; it is an introduction to the organisation of information on large-scale Web sites in a way that helps the search and retrieval process.

Any informed lay reader with an interest in the search process, or information architecture, should find the content interesting and informative. The book is unusual in there is no example source code, no algorithms, and a complete absence of obtuse discussion. In the preface the authors describe themselves as "information architecture evangelists at heart" and describe their intended audience as "anyone who's interested in information architecture, and maybe a few who aren't". Many authors of IA texts are similarly keen to press the case for a formal approach to the organisation of information.

A feature of the book that impressed me is the care taken to explain terms and concepts. For example, granularity crops up in various computer science texts (especially to do with data compression); in this book it is defined. There is even a section entitled Technical Lingo. Readers don't need a degree in computer science to appreciate IA and this title does not assume any special background. Indeed, the authors make the point, "search is not an IT thing", and go on to say that " ... ultimately search is here for users, and it's the responsibility of the information architect to advocate for users. An information architect will typically understand more [than an IT specialist] about how a search engine might benefit users by leveraging Metadata or how it should be integrated with browsing ... ". If you are thinking of an IA career this book is the best introduction I have seen.

Searching and IA are two sides of the same coin, and the authors provide some exceptionally useful and lucid discussions of how search engines operate.

For those with an interest in the IA tool box Part IV of the book, Information Architecture in Practice, includes a chapter on tools and software that lists by category resources (literature, much of which is available on web sites) and sources (URLs) for software such as filtering tools, portal solutions, and thesaurus management tools.

like to explore IA career possibilities, read this book (it also provides URLs for access to the IA community). If you are an ordinary user who would like to be better informed about IA and search engines, read this book.

Rosenfeld and Morville:
Information Architecture for the World Wide Web 2/e
ISBN 0-596-00035-9
Published by O'Reilly,
461 pp.,
RRP $74.95 incl. GST

The Search

This is not just another book about the rise and rise of Google; it looks at the history of the search industry and where it is going in respect of its commercial direction and development of the technology.

So, who is the book's audience? Anyone with an interest in the development of search technologies, whether as a software engineer or developer; practitioners in the field of information storage and retrieval; students or practitioners of business administration; investment analysts; social anthropologists examining the social impact of computer-based communications; historians of the digital age; and informed lay readers who like to keep abreast of what is happening around them.

It should be mandatory reading for those people who champion the idea of providing everybody — at least, everybody who is likely to vote for them — with a high capacity Internet connection. In the first two chapters the author does some philosophising (The Database of Intentions) and examines the who, what, why, where, and how of searching. There is an interesting graph that suggests that the "query frequency" for sex is higher in the order of "11-110 a thousandfold ... [than] the average query". Those interested in pursuing research on 'what is searched' will find useful references.

The Search is a valuable contribution to a small, but significant body of literature that deals with the history and many non-technical and quasi-technical aspects of the computer era, especially the social impact.

It is a good read. The author does not dwell on technical matters, but is able to explain to lay readers how search systems work; along the way he throws up many interesting anecdotes about how business deals were done, and discusses future developments. It paints the big picture, but includes a surprising amount of detail.

Books like this are far too few and we should be grateful for the opportunity to share John Battelle's insights and accounts of events.

John Battelle: The Search
ISBN 1-85788-361-6
Published by Nicholas Brealey
(distributed in Australian by Allen and Unwin),
111 pp., index, notes, hardcover,
RRP $39.95 incl. GST

Reprinted from the July 2006 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[ About Melbourne PC User Group ]