Description

Bugs in the new search engine:

It make sense that "term" will find only "term" like "two words" find only "two words"

It make sense that "term" find only whole words.

IIRC the old search did exaclty the same. -- FlorianFesti 2004-10-02 13:50:14

Details

MoinMoin Version

1.3--patch-152

Workaround

Discussion

HelpOnSearching say:

This this might view as a feature. But common sense say that when you search for "somthing" or "this and that", you are looking for whole words. As this is the default - I think it should use this behavior.

If one want to look for part of a word, he can use r:something or r:this\sand\sthat.

Google search work like this:

It does not say anything on whole words, I wonder how they treat this issue.

In Hebrew it does make sense NOT to look for whole words, as the same word in Hebrew can appear with extra letters. For example, "to a wiki" is written as "lewiki" - the word "wiki" get a "le" prefix. "to the wiki" is "lawiki" and "the wiki" is "hawiki" (Hebrew is like perl - short and hard to read). So When you look for "Wiki" you want to find the same word with the "le" or "la" or other prefixes or sufixes.

But even in Hebrew - one does not expect to get somthing which is not inside the "search term"

It seems that it does not make sence in English though. Does it make sence to make the search language aware? How you define a the language of a search? I think the best would be one default for all languages. If we have behavior that works bad for some language, we can make this a configuration option, as each wiki has one language usually, and the wiki admin can choose what works better.

The question is what is the common case that most people expect, and make this the default. We also must add much more examples for common search tasks to HelpOnSearching.

Related links:

"Real" search index use indices. They tweak the word before enter them into the index. If you add "\b" to the begin and the end of the word you will no longer find things you hant to have - like words with plural s or gramatical endings with are more common than in english in most other european languages. In german it characterazation of word is done by appending other words. battle ship is Schlachtschiff in German while Schlacht is battle (literal translation is slaughter, btw) and Schiff is ship. If your are search for ship you want to find battle ship, too.

No, this is not the issue. The issue is what do "quoted words" or even one "word" mean.

I see no difference between "ship" and ship. The quotes are intended for quoting characters that have special meaning.

We can add ranking to the system, and this can solve the cases that are not clear. For example, if I get a relevance bar for each search result, lets say a system of stars [*****] for very relevant and empty [     ] for not relevant, I can show both results, but sort them in a useful way.

Example - search for "ship"

No

Rank

Found

1

*****

Ship

2

***  

battleship

3

***  

othership

We already have a kind of ranking system. By now it is quite simple. We can improve it in the 1.4 branch. I would like not to add further features to 1.3.

I would like to see this feature here (lets call it Google semantics for a parser that stole 80% of his ideas from Google): The parser tries to replace "ham" by regex:\bham\b if it makes sense. Why? quite easy - you do not really expect that someone tries to learn regex if he wants to do a phrase-search. It is not documented? Oh, easy to change. It is difficult to implement? Not really. It breaks the old search semantics! Yes, but I wonder if someone really uses the ""-for-space-escaping-mode. If you really need battle ship[s], then write a regex (as regexes should not be transformed like described above). -- AlexanderSchremmer 2004-11-29 22:27:09

Plan

Add an examle that shows how "quoted words" finds also part of another word.


CategoryMoinMoinNoBug

MoinMoin: MoinMoinBugs/NewSearchEngine (last edited 2007-10-29 19:09:06 by localhost)