Normalization
The normalization engine is designed to standardize commercial parameter data, making different parameters comparable by adjusting them to fit within a range of [0
->1
]. The engine also groups each normalized value into a partition, based on the parameter type.
To enhance response times, the system periodically pre-calculates and stores these normalized values, eliminating the need for on-the-fly calculations. Any changes made to the commercial parameter data will automatically prompt a new normalization calculation.
flowchart LR
subgraph parameter["Parameter"]
direction TB
A[Partition 1]---B
B[Partition 2]-.-N["Partition N"];
end
ingested[Ingested \n Parameter] --> parameter
normalized["Normalized \n Value"]
parameter -- Partition into a \n Normalized Value ---> normalized
Commercial properties
A normalized value will be created for each partition, which means that is possible to see how well each item fits within any given partition. All the normalized values for an item is called the commercial properties of that item:
Item | Parameter | Partition | Value |
---|---|---|---|
WINE-42 | STOCK | FEW | 1 |
STOCK | SOLD OUT | 0 | |
STOCK | PLENTY | 0 | |
NEW PRODUCT | 7 DAYS | 0 | |
NEW PRODUCT | 14 DAYS | 0 | |
NEW PRODUCT | 21 DAYS | 1 | |
PRODUCT CLICK | RANK | 0.75 |
The table above shows an example of how the normalized values could be calculated for a single item.
Parameter normalization
The following section describes how each parameter is normalized, depending on the parameter type.
Term
A term normalization will look at the input value and match the input against the terms defined on partitions to find a fitting partition.
Warning
The matching is case sensitive e.g. "FEW" and "few" are two distinct values.
An input value matching the partition would get the normalized value 1, and the value 0 for all other partitions of the parameter.
Example of Term Normalization
Given the following SKU with a STOCK
parameter:
Item | Parameter | Ingested Value |
---|---|---|
WINE-42 | STOCK | FEW |
Partitions
Parameter | Partition | Term |
---|---|---|
STOCK | FEW | FEW |
STOCK | SOLD OUT | SOLDOUT |
STOCK | PLENTY | PLENTY |
Result of Normalization
Item | Parameter | Partition | Value |
---|---|---|---|
WINE-42 | STOCK | FEW | 1 |
STOCK | SOLD OUT | 0 | |
STOCK | PLENTY | 0 |
Range
A range normalization matches the input value against the ranges defined on the partitions to find a fitting partition.
An input value within the range of a partition gets the normalized value 1
, and all other partitions of the parameter get 0
as their normalized value.
Example of Range Normalization
Given the following SKU with a NEW PRODUCT
parameter:
Item | Parameter | Ingested Value |
---|---|---|
WINE-42 | NEW PRODUCT | 12 |
Partitions
Parameter | Partition | From | To |
---|---|---|---|
NEW PRODUCT | 7 DAYS | 0 | 8 |
NEW PRODUCT | 14 DAYS | 8 | 15 |
NEW PRODUCT | 21 DAYS | 15 | 22 |
Result of Normalization
Item | Parameter | Partition | Value |
---|---|---|---|
WINE-42 | NEW PRODUCT | 7 DAYS | 1 |
NEW PRODUCT | 14 DAYS | 0 | |
NEW PRODUCT | 21 DAYS | 0 |
Rank
A rank normalization uses the input value to numerically sort and rank the items.
The ranking is linear, which means that regardless of how unevenly the ingested parameter values are distributed, the normalized values will be evenly distributed.
Example of Rank Normalization
Given the following SKU's with a CLICKS
parameter:
Item | Parameter | Ingested Value |
---|---|---|
WINE-42 | CLICKS | 12 |
WHISKEY-22 | CLICKS | 50 |
GIN-10 | CLICKS | 26 |
RUM-14 | CLICKS | 26 |
BEER-4 | CLICKS | 30 |
Partitions
Parameter | Partition |
---|---|
CLICKS | Rank |
Result of Normalization
Item | Parameter | Partition | Value |
---|---|---|---|
WINE-42 | CLICKS | Rank | 0.00 |
WHISKEY-22 | CLICKS | Rank | 1.00 |
GIN-10 | CLICKS | Rank | 0.50 |
RUM-14 | CLICKS | Rank | 0.25 |
BEER-4 | CLICKS | Rank | 0.75 |
Cluster Rank
A cluster rank normalization is similar to a normal rank normalization, but with a key difference. Instead of considering all values, it calculates rank clusters based on a sample of values. The remaining values are then grouped into these clusters.
Clustering groups items based on similarity, resulting in clusters that vary in size. This approach has advantages over rank normalization, especially when dealing with parameters with many similar values.
Advantage over Rank
In a store with 20,000 SKUs and a clicks parameter, if half the products only have one click, rank normalization would have half of the rank range representing 1-click products. Cluster Rank creates non uniform groups, such that all 1-click products get the same rank.
Cluster rank normalization is significantly faster than normal rank normalization. This makes the cluster rank normalization a great choice when dealing with large datasets or when values are frequently updated.
Limitations
Cluster Rank only works when there is at least 200 values within a commercial parameter
Proxy
A proxy normalization directly uses the input value as the normalized value.
While any value is accepted, it's recommended that the value falls within the range of [0
->1
]. This ensures consistency with the normalized values generated for other parameters.