Newsportal USENET - Re: Does cheating produce faster searches?

Luc <luc@sep.invalid> wrote:

Suppose I have a large list. Very large list. Then I want to search
for an item in that string:

% lsearch $list "string"

Now, suppose I have many lists instead. One list contains all the items
that begin with the letter a, another list contains all the items that
begin with the letter b, another list contains all the items that begin
with the letter c, and so on. Then I see what the first character in
"string" is and only search for it in the one corresponding list.

What you have described is, at a very high level, similar to how B-Tree
indexes work.

https://en.wikipedia.org/wiki/B-tree

And they are the common index type used in databases for faster data
retreival.

Would that be faster? I somehow suspect the answer is 'no.'

The answer is actually "it depends". You'd have to define "very
large". For some value of "very large" the extra time spent to pick
which sublist to search will most likely be faster than doing a search
across a single list containing all the items.

But, as is always the case, the algorithm matters.

For basic 'lsearch', it does, by default, a linear search. It starts
at index 0, and looks at each item sequentially until it finds a match,
or hits the end of the list. For this algorithm, your "separate based
on first character" method will be faster for a not-so-big value of
"very large".

But, lsearch also has the "-sorted" option. For a "very large" list
that you can maintain in sorted order, using "-sorted" causes lsearch
to perform a binary search upon the list.

https://en.wikipedia.org/wiki/Binary_search

For this alternate algorithm, your "separate lists" method will need a
much larger value for "very large" vs. the linear search without
"-sorted" before the "separate lists" are faster.

However, "-sorted" now leaves you with the need to have "very large
list" be sorted. And the time needed to sort the list, for a "very
large" list, could be substantial. So you'd need, in this case, a way
to either only sort once, but use many times, or if the list is being
modified, you'd want to be careful to modify it such that it is still
sorted vs. needing to "sort" it over and over again.

Bonus question: what about sqlite operations? Would they be faster if
I had one separate table for each initial letter/character?

Again, it depends.

If you have no indexes on your tables, then likely yes.

If you have indexes on the tables, that can also be utilized by the
query you are running, then the size of the table will need to be much
much larger before the "split up" version might become faster.

Date	Sujet	#	Auteur
25 Sep 24	Does cheating produce faster searches?	5	Luc
25 Sep 24	Re: Does cheating produce faster searches?	1	Rich
27 Sep 24	Re: Does cheating produce faster searches?	3	Shaun Deacon
27 Sep 24	Re: Does cheating produce faster searches?	2	Luc
27 Sep 24	Re: Does cheating produce faster searches?	1	Rich