From: John Conover <john@email.johncon.com>
Subject: TeX/LaTeX in relevance searching
Date: Wed, 23 Nov 94 01:27 PST
FYI, attached pls. find a copy of a correspondence between bnb@math.ams.org and myself (sometime in 1993.) The context of the discussion was the standardization of TeX/LaTeX to facilitate an information retrieval (i.e., electronic literature search) system implemented via an inverted index. I think the concept of the relevance search schema may be relevant to this discussion, although any discussion of the standardization of TeX/LaTex would be inappropriate here, and should be directed elsewhere. > If you are using a full text database information retrieval system, > (ie., an electronic literature search system,) it is an advantage to > be able to do relevance searches. For example, the incidence of a word > found in a \section{...} heading would be weighted higher than if the > word was found simply in a paragraph. Note the issue here; relevance > information can be obtained from the way the author structured the > document. > > To build such an information retrieval system (presumably distributed > over a heterogeneous network,) I need the syntax to a document > structure standard. Quite probably, LaTeX comes closest to meeting > the requirements. SGML also overlaps into this area, and has > significant inertia in the market place (particularly, Europe) since > it is, arguably, an international standard. (Yes, I understand that > TeX is a typesetting language-but the LaTeX macros extend this > capability into the document structure area.) > > Not too many systems will allow you to query for the contents of a > table, citations, figure captions, etc. (Or for "where was that > integral," ie., query for \int ...) Could possibly use the Unix MTA as > a carrier, also. -- John Conover, john@email.johncon.com, http://www.johncon.com/