The ndex
program requires mmap(2) to map the database file into the
Unix VM system, meaning the database file is not copied into
allocated memory. The database file resides in the Unix VM
system, and access is quite quick since it uses the hardware
LRU/MRU/tag table/cache memory page system for access/search,
and there are no practical limits, (other than integer size,)
on the size of the file.
The binary search algorithm is implemented as follows:
Recursively binary search a memory area, referenced
by the begin and end variables, for string. The variable
begin references the beginning of the area to be searched;
the variable end references the end of the area to be
searched, and will always reference the '\n' terminating
character of the last record of the memory area, (which is a
requirment); the variable string is the null terminated
string to search for, and has a length represented by the
variable string_len.
Requirements:
The memory area size must be greater than zero,
(this is a requirement for mmap(1), also.)
The last character of the memory area must be a
'\n' text file record terminator, e.g., the last record,
(which, minimally, may be the only record,) must be
terminated.
The records in the memory area must be in lexical
sequence, (consistent with strcmp() and
strncmp().)
The general strategy is:
For each iteration in the recursion:
Divide the memory area exactly in two.
Work from this middle, in both directions, to
find the end of the preceding record, (or beginning of
file, whichever comes first) and, the end of the
current record:
The current record contains the first
instance of the the terminating character at, or
past, the middle.
The current record's beginning is the next
character after the preceding records's
terminating character, or the beginning of the
memory area.
Compare this record with the string for a
match.
If not a match, decide which direction to move
for the next iteration, and call this function,
again.
The recursive binary search terminates when a
match is found, or when there is no more memory area
to search.
Return a reference to the matched area in memory,
or null if no match.
A license is hereby granted to reproduce this software
source code and to create executable versions from this source
code for personal, non-commercial use. The copyright notice
included with the software must be maintained in all copies
produced.
THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO
WARRANTIES WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING
WARRANTIES OF MERCHANTABILITY, TITLE, OR FITNESS FOR ANY
PARTICULAR PURPOSE. THE AUTHOR DOES NOT WARRANT THAT USE OF
THIS PROGRAM DOES NOT INFRINGE THE INTELLECTUAL PROPERTY
RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.
So there.
Copyright © 1994-2007, John Conover, All Rights
Reserved.
Comments and/or bug reports should be addressed to:
- john@email.johncon.com
- http://www.johncon.com/
- http://www.johncon.com/ntropix/
- http://www.johncon.com/ndustrix/
- http://www.johncon.com/nformatix/
- http://www.johncon.com/ndex/
- John Conover
- john@email.johncon.com
- January 6, 2005
|