Pipeline analysis
The pipeline analysis tool shows how the pipeline of the search engine affects a search phrase.
The purpose of the pipeline is to match all words with similar meanings, while not matching with other words. E.g., searching for plank
will also find results containing planks
but not planes
.
This page explains the individual steps of the text analysis pipeline in more detail.
HTML strip
Translates HTML-encoded symbols.
100€
→ 100€
Language normalization
Removes or replaces symbols that don't have any meaning in the language of the current segment.
Mr. Röcket
→ Mr. Rocket
Tokenization
Splits the search phrase up into individual tokens, which are the individual text entities the search engine uses to match products with a search phrase.
[the brown cat]
→ [the]
[brown]
[cat]
Lowercase
Transforms tokens to lowercase.
rOCkET
→ rocket
Elision
Removes elisions.
L'Astronaut
→ Astronaut
Multi-words
Combines multi-words into single tokens. [ice]
[cream]
→ [ice_cream]
Misspellings
Replaces some tokens, based on rules set up in the misspellings dictionaries.
Calender
→ Calendar
Ignored words
Removes some tokens, based on rules set up in the ignored words dictionaries.
[the]
[brown]
[cat]
[is]
[sleeping]
→ [brown]
[cat]
[sleeping]
Possessive s stemmer
Removes possessive s
.
[john's]
[rocket]
→ [john]
[rocket]
Stemming overrides
Manipulates the stemming of words, based on rules set up in the stemming overrides dictionaries.
Language stemmer
Shortens words down to a common base form by following the stemming rules of the language.
[dancing]
[dance]
[dancer]
→ [danc]
[danc]
[danc]
Hypernyms
Makes some tokens also match alternative tokens, based on rules set up in the hypernyms dictionary.
[pasta]
→ [pasta]
|[lasagne]
|[tortelli]
Synonyms
Makes some tokens also match alternative tokens, based on rules set up in the synonyms dictionary.
[space]
→ [space]
|[cosmos]
Irregular words
Makes some tokens also match alternative tokens, based on rules set up in the irregular dictionary.
[space]
→ [woman]
|[women]