Stemming

Stemming is the process of reducing the words in a search phrase down to a base form. Words with the same meaning should be stemmed down to the same word. The purpose is to be able to find search results without having to match the right form of a word. E.g. walking and walked would be stemmed down to the same word (probably walk) since if searching for one, results containing the other would also be relevant.

How words are stemmed depends on the language of the segment. As an example, some of the things removed for an English text would be plural s and es, while in Danish plural ne and ene would be removed.

Stemmed words are not always real words. E.g. jumping might be reduced to jumpi.

Stemming does not always work as intended:

  • When a word doesn't follow normal grammatical rules. E.g. Mouse becomes Mice instead of Mouses. This can be fixed with irregular words.
  • When multiple words, which mean different things, are stemmed to the same word. E.g. Training will become Train after ing is removed. This can be fixed with stemming overrides.