From: John Conover <john@email.johncon.com>
Subject: Re: Learning Histories LO4560
Date: Wed, 3 Jan 1996 17:28:40 -0800
GaltJohn22@aol.com writes: > Replying to LO4531 -- > > Well... > > HyperText is, in a deliberate sense, NOT hierarchical. It is rather a > "Sparse Matrix" where depth and breadth are supposedly indeterminate and > irrelevant over time. That is it can grow in all diections or a few or > several at once with such growth niether planned nor managed. > HyperText has the disadvantage that it requires the data's structural links to be defined in anticipation of the queries that will be asked in the future when the documents are stored in an information retrieval system. In this sense, it is more like a table of contents than an index. It is a one way street, since once you structure the data, it is very difficult to change the structure, (and the way you look at the data, BTW.) A good example of this is the yellow pages in a telephone phone book; to find something by subject is very fast and expedient-however, given the phone number of a business, finding the subject that corresponds the phone number degenerates into an exhaustive search. The Unix man pages are another good example-if you know the command you are looking for, it is very easy, if you don't, it is very difficult. John BTW, Hypertext is a rather new name for what was called Memex as proposed by Vannevar Bush (Bush, V. (1941) "Memorandum regarding Memex," [Vannevar Bush Papers, Library of Congress], Box 50, General Correspondence File, Eric Hodgins,) and the issues involved in a priori structuring of the data (technically known as "content data,") were known and understood at that time. A better alternative is probably to order the documents by relevance-at search time-as opposed to structuring the documents when the documents are stored. The technical name for data provided in such a manner is "context data." Just in case you were curious ... References: William B. Frakes and Ricardo Baeze-Yates, "Information Retrieval", Prentice-Hall, Englewood Cliffs, New Jersey, 1992. (Note: The sources for the many of the algorithms presented in Frakes are available from ftp.vt.edu:/pub/reuse/ircode.tar.Z via anonymous ftp.) Charles T. Meadow, "Text Information Retrieval Systems", Academic Press, San Diego, California, 1992. Carol Tenopir and Jung Soon Ro, "Full Text Databases", Greenwood Press, New York, New York, 1990. Susan Jones, "Text and Context", Springer-Verlag, London, England, 1991. Freely available information retrieval programs that support relevance ordering of documents and are available via anonymous ftp on the Internet (these ftp addresses are for the program source codes): Wais, think.com:/wais/wais-8-b5.1.tar.Z. Lq-text, cs.toronto.edu:/pub/lq-text1.10.tar.Z. Qt, ftp.uu.net:/usenet/comp.sources/unix/volume27. Rel, ftp.uu.net:/usenet/comp.sources/unix/Volume 28, Issue 212. -- John Conover, john@email.johncon.com, http://www.johncon.com/