Where did the name "ndex"
come from?The program was originally modified from
the bsearchtext
in the ReceivedIP
program suite where it was used for constructing fast
whitelist/blacklist databases accessible in procmail(1) scripts. It
was then used in an very large Index Sequential Access
Method, (ISAM,) database for archiving financial, (i.e., the
"stock ticker,") data. The database was spread across
several machines, each running multiple mounted volumes,
(usually, a volume per month,) and each, machine being
network accessible, handled the historical data for an
entire calendar year. It was used as the ISAM "data blade"
engine for the tsinvest
program at the NtropiX
site and the NdustriX
program suite. So, the index part of the ISAM was handled by
a perl(1) preprocessing script that would direct a query to
a specific machine, and then to a specific disk/volume, and
the ndex
program would return the financial time series, (indexed in
the first field by YYYYMMDDMMSS,) from a specific date/time
to a specific date/time, inclusive.
What is a simple test of the ndex
program? Just sort the Unix system dictionary,
something like "sort -u /usr/local/lib/dict/websters >
db" to make a database, then "ndex db" and type some words
to look up.
What is a simple test of the ndex
program on large database files? Again, just sort
multiple instances of the the Unix system dictionary,
something like "sort /usr/local/lib/dict/websters
/usr/local/lib/dict/websters ... > db" to make a large
database, then "ndex db" and type some words to look
up.
Is there a command line option for the keywords
to search for in ndex?
Yes, you can "ndex db word1 word2 word3 ...", or, you can
"ndex db < myfile", where myfile contains the words
word1, word2, word3, ... one word per line/record. The
database filename is a required argument. Any other
non-switch arguments are regarded as words to look up; if
there are none, ndex
expects the words to come from stdin, one word per
record/line.
How can I make a large tab delimited flat file
database? Use the Unix paste(1) command, for
example, "paste websters websters > db" will make a tab
delimited file of all the words in the Unix system
dictionary.
How can I print the words on the command line,
or input file on stdin, that are NOT in the
database? Use the -P option, for example "ndex -P
db < myfile".
How do I do partial key searches?A
partial key is something like you search for the word
"John", and it would return all words in the database that
begin with "John", such as "Johnny", etc. You do this with
the -e option, for example, "ndex -e db John".
How do I do specific, non-partial key,
searches in a flat file database?A partial key is
something like you search for the word "John", and it would
return all words in the database that begin with "John",
such as "Johnny", etc. You do this with the -e option, for
example, "ndex -e db John". You do the exact same thing when
searching for a specific non-partial key word, but you
include the trailing tab delimiter, for example, "ndex -e db
'John ", (where the trailing whitespace is a tab
character-you can use Ctrl-V-Tab when using the bash(1)
shell to insert tabs.)
How do you update an ndex
database without concurrency control? The rename
program is used specifically for that purpose. It uses
rename(2) to move a file in such a manner that either the
old, or the new, version will be available to the ndex
program at all instances of time. So, one would make a new
version of the database, db.new, and then "rename db.new db"
where ndex
operates on the database file, db.
How do you access ndex
across a network? There are several ways. It is
best, when at all possible, to use existing networking
agents of known security robustness. If ssh
is adequately fast, (and the remote is running the ssh
daemon,) use an ssh tunnel:
ssh theaccount@theremote.com ndex ndexfile keyword
will encrypt the transaction, and query the remote
system's ndexfile for the keyword. Netcat and tcpserver which
can run under daemontools or
inetd(8) are other alternatives.
How do you search for phonetic
keywords? Hint. Use the phonetic/simplex system out
of the rel.tar.gz
sources at NformatiX as
the search key, (e.g., the first field-the one ndex
will find after converting the keyword/phrase to soundex-maybe
using a partial key search,) in a tab delimited flat file
database. The remaining fields can be whatever is
appropriate-names, addresses, pointers to other files or
databases, etc. See: pdict.tar.gz
for a similar approach-for looking up words, phonetically,
in the Unix system dictionary.
If several flat file databases/tables are
appended simultaneously across a network and/or volumes, is
there a concurrency issue with ndex?
Yes, a lock agent will have to be implemented to avoid the
phantom
record problem. Although the lock agent itself is rather
straight forward, the network/volume wide fault and
exception handling is not. Bear in mind that ndex
was implemented as a constant database, where updates are
appended to the end of the database file, as is the usual
case for financial time series databases. As an example of
the complexity using semaphore locks, see dbappend
in the dbappend.shar.gz
archive.
|