Relevance Searching

From: John Conover <john@email.johncon.com>
Subject: Relevance Searching
Date: Mon, 6 Dec 1993 23:29:35 -0800 (PST)


The following is a copy of a correspondence between bnb@math.ams.org
and myself. The context of the discussion was the standardization of
TeX/LaTeX to facilitate an information retrieval (i.e., electronic
literature search) system implemented via an inverted index. I think
the concept of the relevance search schema may be relevant to this
discussion, although any discussion of the standardization of
TeX/LaTex would be inappropriate here, and should be directed to
comp.text.tex.

> If you are using a full text database information retrieval system,
> (ie., an electronic literature search system,) it is an advantage to
> be able to do relevance searches. For example, the incidence of a word
> found in a \section{...} heading would be weighted higher than if the
> word was found simply in a paragraph. Note the issue here; relevance
> information can be obtained from the way the author structured the
> document.
>
> To build such an information retrieval system (presumably distributed
> over a heterogeneous network,) I need the syntax to a document
> structure standard.  Quite probably, LaTeX comes closest to meeting
> the requirements. SGML also overlaps into this area, and has
> significant inertia in the market place (particularly, Europe) since
> it is, arguably, an international standard.  (Yes, I understand that
> TeX is a typesetting language-but the LaTeX macros extend this
> capability into the document structure area.)
>
> Not too many systems will allow you to query for the contents of a
> table, citations, figure captions, etc. (Or for "where was that
> integral," ie., query for \int ...) Could possibly use the Unix MTA as
> a carrier, also.

--

John Conover, john@email.johncon.com, http://www.johncon.com/

Last modified: Fri Mar 26 18:58:44 PST 1999 $Id: 931207073001.2661.html,v 1.0 2001/11/17 23:05:50 conover Exp $