Skip to content

Normalization

The normalization engine is designed to standardize commercial parameter data, making different parameters comparable by adjusting them to fit within a range of [0->1]. The engine also groups each normalized value into a partition, based on the parameter type.

To enhance response times, the system periodically pre-calculates and stores these normalized values, eliminating the need for on-the-fly calculations. Any changes made to the commercial parameter data will automatically prompt a new normalization calculation.

flowchart LR
    subgraph parameter["Parameter"]
        direction TB
            A[Partition 1]---B
            B[Partition 2]-.-N["Partition N"];
    end
    ingested[Ingested \n Parameter] --> parameter
    normalized["Normalized \n Value"]
    parameter -- Partition into a \n Normalized Value --->  normalized

Commercial properties

A normalized value will be created for each partition, which means that is possible to see how well each item fits within any given partition. All the normalized values for an item is called the commercial properties of that item:

Item Parameter Partition Value
WINE-42 STOCK FEW 1
STOCK SOLD OUT 0
STOCK PLENTY 0
NEW PRODUCT 7 DAYS 0
NEW PRODUCT 14 DAYS 0
NEW PRODUCT 21 DAYS 1
PRODUCT CLICK RANK 0.75

The table above shows an example of how the normalized values could be calculated for a single item.

Parameter normalization

The following section describes how each parameter is normalized, depending on the parameter type.

Term

A term normalization will look at the input value and match the input against the terms defined on partitions to find a fitting partition.

Warning

The matching is case sensitive e.g. "FEW" and "few" are two distinct values.

An input value matching the partition would get the normalized value 1, and the value 0 for all other partitions of the parameter.

Example of Term Normalization

Given the following SKU with a STOCK parameter:

Item Parameter Ingested Value
WINE-42 STOCK FEW

Partitions

Parameter Partition Term
STOCK FEW FEW
STOCK SOLD OUT SOLDOUT
STOCK PLENTY PLENTY

Result of Normalization

Item Parameter Partition Value
WINE-42 STOCK FEW 1
STOCK SOLD OUT 0
STOCK PLENTY 0

Range

A range normalization matches the input value against the ranges defined on the partitions to find a fitting partition.

An input value within the range of a partition gets the normalized value 1, and all other partitions of the parameter get 0 as their normalized value.

Example of Range Normalization

Given the following SKU with a NEW PRODUCT parameter:

Item Parameter Ingested Value
WINE-42 NEW PRODUCT 12

Partitions

Parameter Partition From To
NEW PRODUCT 7 DAYS 0 8
NEW PRODUCT 14 DAYS 8 15
NEW PRODUCT 21 DAYS 15 22

Result of Normalization

Item Parameter Partition Value
WINE-42 NEW PRODUCT 7 DAYS 1
NEW PRODUCT 14 DAYS 0
NEW PRODUCT 21 DAYS 0

Rank

A rank normalization uses the input value to numerically sort and rank the items.

The ranking is linear, which means that regardless of how unevenly the ingested parameter values are distributed, the normalized values will be evenly distributed.

Example of Rank Normalization

Given the following SKU's with a CLICKS parameter:

Item Parameter Ingested Value
WINE-42 CLICKS 12
WHISKEY-22 CLICKS 50
GIN-10 CLICKS 26
RUM-14 CLICKS 26
BEER-4 CLICKS 30

Partitions

Parameter Partition
CLICKS Rank

Result of Normalization

Item Parameter Partition Value
WINE-42 CLICKS Rank 0.00
WHISKEY-22 CLICKS Rank 1.00
GIN-10 CLICKS Rank 0.50
RUM-14 CLICKS Rank 0.25
BEER-4 CLICKS Rank 0.75

Cluster Rank

A cluster rank normalization is similar to a normal rank normalization, but with a key difference. Instead of considering all values, it calculates rank clusters based on a sample of values. The remaining values are then grouped into these clusters.

Clustering groups items based on similarity, resulting in clusters that vary in size. This approach has advantages over rank normalization, especially when dealing with parameters with many similar values.

Advantage over Rank

In a store with 20,000 SKUs and a clicks parameter, if half the products only have one click, rank normalization would have half of the rank range representing 1-click products. Cluster Rank creates non uniform groups, such that all 1-click products get the same rank.

Cluster rank normalization is significantly faster than normal rank normalization. This makes the cluster rank normalization a great choice when dealing with large datasets or when values are frequently updated.

Limitations

Cluster Rank only works when there is at least 200 values within a commercial parameter

Proxy

A proxy normalization directly uses the input value as the normalized value.

While any value is accepted, it's recommended that the value falls within the range of [0->1]. This ensures consistency with the normalized values generated for other parameters.