Stemming
Stemming is the process of reducing the words in a search phrase down to a base form. Words with the same meaning should be stemmed down to the same word. The purpose is to be able to find search results without having to match the right form of a word.
E.g. walking and walked would be stemmed down to the same word (probably walk) since if searching for one, results containing the other would also be relevant.
How words are stemmed depends on the language of the segment. As an example, some of the things removed for an English text would be plural s and es, while in Danish plural ne and ene would be removed.
Stemmed words are not always real words. E.g. jumping might be reduced to jumpi.
Stemming does not always work as intended:
- When a word doesn't follow normal grammatical rules. E.g.
MousebecomesMiceinstead ofMouses. This can be fixed with irregular words. - When multiple words, which mean different things, are stemmed to the same word. E.g.
Trainingwill becomeTrainafteringis removed. This can be fixed with stemming overrides.