Skip to content

Pipeline analysis

The pipeline analysis tool shows how the pipeline of the search engine affects a search phrase.

The purpose of the pipeline is to match all words with similar meanings, while not matching with other words. E.g., searching for plank will also find results containing planks but not planes.

This page explains the individual steps of the text analysis pipeline in more detail.

HTML strip

Translates HTML-encoded symbols.

100€100€

Language normalization

Removes or replaces symbols that don't have any meaning in the language of the current segment.

Mr. RöcketMr. Rocket

Tokenization

Splits the search phrase up into individual tokens, which are the individual text entities the search engine uses to match products with a search phrase.

[the brown cat][the] [brown] [cat]

Lowercase

Transforms tokens to lowercase.

rOCkETrocket

Elision

Removes elisions.

L'AstronautAstronaut

Multi-words

Combines multi-words into single tokens. [ice] [cream][ice_cream]

Misspellings

Replaces some tokens, based on rules set up in the misspellings dictionaries.

CalenderCalendar

Ignored words

Removes some tokens, based on rules set up in the ignored words dictionaries.

[the] [brown] [cat] [is] [sleeping][brown] [cat] [sleeping]

Possessive s stemmer

Removes possessive s.

[john's] [rocket][john] [rocket]

Stemming overrides

Manipulates the stemming of words, based on rules set up in the stemming overrides dictionaries.

Language stemmer

Shortens words down to a common base form by following the stemming rules of the language.

[dancing] [dance] [dancer][danc] [danc] [danc]

Hypernyms

Makes some tokens also match alternative tokens, based on rules set up in the hypernyms dictionary.

[pasta][pasta]|[lasagne]|[tortelli]

Synonyms

Makes some tokens also match alternative tokens, based on rules set up in the synonyms dictionary.

[space][space]|[cosmos]

Irregular words

Makes some tokens also match alternative tokens, based on rules set up in the irregular dictionary.

[space][woman]|[women]