NdeX: Frequently Asked Questions, (FAQs)

Program for Binary Searching a Constant Flat File Database:

Frequently Asked Questions, (FAQs)

Where did the name "ndex" come from?The program was originally modified from the bsearchtext in the ReceivedIP program suite where it was used for constructing fast whitelist/blacklist databases accessible in procmail(1) scripts. It was then used in an very large Index Sequential Access Method, (ISAM,) database for archiving financial, (i.e., the "stock ticker,") data. The database was spread across several machines, each running multiple mounted volumes, (usually, a volume per month,) and each, machine being network accessible, handled the historical data for an entire calendar year. It was used as the ISAM "data blade" engine for the tsinvest program at the NtropiX site and the NdustriX program suite. So, the index part of the ISAM was handled by a perl(1) preprocessing script that would direct a query to a specific machine, and then to a specific disk/volume, and the ndex program would return the financial time series, (indexed in the first field by YYYYMMDDMMSS,) from a specific date/time to a specific date/time, inclusive.
What is a simple test of the ndex program? Just sort the Unix system dictionary, something like "sort -u /usr/local/lib/dict/websters > db" to make a database, then "ndex db" and type some words to look up.
What is a simple test of the ndex program on large database files? Again, just sort multiple instances of the the Unix system dictionary, something like "sort /usr/local/lib/dict/websters /usr/local/lib/dict/websters ... > db" to make a large database, then "ndex db" and type some words to look up.
Is there a command line option for the keywords to search for in ndex? Yes, you can "ndex db word1 word2 word3 ...", or, you can "ndex db < myfile", where myfile contains the words word1, word2, word3, ... one word per line/record. The database filename is a required argument. Any other non-switch arguments are regarded as words to look up; if there are none, ndex expects the words to come from stdin, one word per record/line.
How can I make a large tab delimited flat file database? Use the Unix paste(1) command, for example, "paste websters websters > db" will make a tab delimited file of all the words in the Unix system dictionary.
How can I print the words on the command line, or input file on stdin, that are NOT in the database? Use the -P option, for example "ndex -P db < myfile".
How do I do partial key searches?A partial key is something like you search for the word "John", and it would return all words in the database that begin with "John", such as "Johnny", etc. You do this with the -e option, for example, "ndex -e db John".
How do I do specific, non-partial key, searches in a flat file database?A partial key is something like you search for the word "John", and it would return all words in the database that begin with "John", such as "Johnny", etc. You do this with the -e option, for example, "ndex -e db John". You do the exact same thing when searching for a specific non-partial key word, but you include the trailing tab delimiter, for example, "ndex -e db 'John ", (where the trailing whitespace is a tab character-you can use Ctrl-V-Tab when using the bash(1) shell to insert tabs.)
How do you update an ndex database without concurrency control? The rename program is used specifically for that purpose. It uses rename(2) to move a file in such a manner that either the old, or the new, version will be available to the ndex program at all instances of time. So, one would make a new version of the database, db.new, and then "rename db.new db" where ndex operates on the database file, db.
How do you access ndex across a network? There are several ways. It is best, when at all possible, to use existing networking agents of known security robustness. If ssh is adequately fast, (and the remote is running the ssh daemon,) use an ssh tunnel:
```
          ssh theaccount@theremote.com ndex ndexfile keyword

          
```
will encrypt the transaction, and query the remote system's ndexfile for the keyword. Netcat and tcpserver which can run under daemontools or inetd(8) are other alternatives.
How do you search for phonetic keywords? Hint. Use the phonetic/simplex system out of the rel.tar.gz sources at NformatiX as the search key, (e.g., the first field-the one ndex will find after converting the keyword/phrase to soundex-maybe using a partial key search,) in a tab delimited flat file database. The remaining fields can be whatever is appropriate-names, addresses, pointers to other files or databases, etc. See: pdict.tar.gz for a similar approach-for looking up words, phonetically, in the Unix system dictionary.
If several flat file databases/tables are appended simultaneously across a network and/or volumes, is there a concurrency issue with ndex? Yes, a lock agent will have to be implemented to avoid the phantom record problem. Although the lock agent itself is rather straight forward, the network/volume wide fault and exception handling is not. Bear in mind that ndex was implemented as a constant database, where updates are appended to the end of the database file, as is the usual case for financial time series databases. As an example of the complexity using semaphore locks, see dbappend in the dbappend.shar.gz archive.

Last modified: Thu Mar 22 18:06:38 PDT 2007 $Id: FAQs.html,v 1.0 2007/03/23 01:11:10 conover Exp $