Stemming
Stemming is the process of reducing the words in a search phrase down to a base form. Words with the same meaning should be stemmed down to the same word. The purpose is to be able to find search results without having to match the right form of a word.
E.g. walking
and walked
would be stemmed down to the same word (probably walk
) since if searching for one, results containing the other would also be relevant.
How words are stemmed depends on the language of the segment. As an example, some of the things removed for an English text would be plural s
and es
, while in Danish plural ne
and ene
would be removed.
Stemmed words are not always real words. E.g. jumping
might be reduced to jumpi
.
Stemming does not always work as intended:
- When a word doesn't follow normal grammatical rules. E.g.
Mouse
becomesMice
instead ofMouses
. This can be fixed with irregular words. - When multiple words, which mean different things, are stemmed to the same word. E.g.
Training
will becomeTrain
aftering
is removed. This can be fixed with stemming overrides.