Normalization

The normalization engine is designed to standardize commercial parameter data, making different parameters comparable by adjusting them to fit within a range of [0->1]. The engine also groups each normalized value into a partition, based on the parameter type.

To enhance response times, the system periodically pre-calculates and stores these normalized values, eliminating the need for on-the-fly calculations. Any changes made to the commercial parameter data will automatically prompt a new normalization calculation.

flowchart LR
    subgraph parameter["Parameter"]
        direction TB
            A[Partition 1]---B
            B[Partition 2]-.-N["Partition N"];
    end
    ingested[Ingested \n Parameter] --> parameter
    normalized["Normalized \n Value"]
    parameter -- Partition into a \n Normalized Value --->  normalized

Commercial properties

A normalized value will be created for each partition, which means that is possible to see how well each item fits within any given partition. All the normalized values for an item is called the commercial properties of that item:

Item	Parameter	Partition	Value
WINE-42	STOCK	FEW	1
	STOCK	SOLD OUT	0
	STOCK	PLENTY	0
	NEW PRODUCT	7 DAYS	0
	NEW PRODUCT	14 DAYS	0
	NEW PRODUCT	21 DAYS	1
	PRODUCT CLICK	RANK	0.75

The table above shows an example of how the normalized values could be calculated for a single item.

Parameter normalization

The following section describes how each parameter is normalized, depending on the parameter type.

Term

A term normalization will look at the input value and match the input against the terms defined on partitions to find a fitting partition.

Warning

The matching is case sensitive e.g. "FEW" and "few" are two distinct values.

An input value matching the partition would get the normalized value 1, and the value 0 for all other partitions of the parameter.

Example of Term Normalization

Given the following SKU with a STOCK parameter:

Item	Parameter	Ingested Value
WINE-42	STOCK	FEW

Partitions

Parameter	Partition	Term
STOCK	FEW	FEW
STOCK	SOLD OUT	SOLDOUT
STOCK	PLENTY	PLENTY

Result of Normalization

Item	Parameter	Partition	Value
WINE-42	STOCK	FEW	1
	STOCK	SOLD OUT	0
	STOCK	PLENTY	0

Range

A range normalization matches the input value against the ranges defined on the partitions to find a fitting partition.

An input value within the range of a partition gets the normalized value 1, and all other partitions of the parameter get 0 as their normalized value.

Example of Range Normalization

Given the following SKU with a NEW PRODUCT parameter:

Item	Parameter	Ingested Value
WINE-42	NEW PRODUCT	12

Partitions

Parameter	Partition	From	To
NEW PRODUCT	7 DAYS	0	8
NEW PRODUCT	14 DAYS	8	15
NEW PRODUCT	21 DAYS	15	22

Result of Normalization

Item	Parameter	Partition	Value
WINE-42	NEW PRODUCT	7 DAYS	1
	NEW PRODUCT	14 DAYS	0
	NEW PRODUCT	21 DAYS	0

Rank

A rank normalization uses the input value to numerically sort and rank the items.

The ranking is linear, which means that regardless of how unevenly the ingested parameter values are distributed, the normalized values will be evenly distributed.

Example of Rank Normalization

Given the following SKU's with a CLICKS parameter:

Item	Parameter	Ingested Value
WINE-42	CLICKS	12
WHISKEY-22	CLICKS	50
GIN-10	CLICKS	26
RUM-14	CLICKS	26
BEER-4	CLICKS	30

Partitions

Parameter	Partition
CLICKS	Rank

Result of Normalization

Item	Parameter	Partition	Value
WINE-42	CLICKS	Rank	0.00
WHISKEY-22	CLICKS	Rank	1.00
GIN-10	CLICKS	Rank	0.50
RUM-14	CLICKS	Rank	0.25
BEER-4	CLICKS	Rank	0.75

Cluster rank

A cluster rank normalization is similar to a normal rank normalization, but with a key difference. Instead of considering all values, it calculates rank clusters based on a sample of values. The remaining values are then grouped into these clusters.

Clustering groups items based on similarity, resulting in clusters that vary in size. This approach has advantages over rank normalization, especially when dealing with parameters with many similar values.

Advantage over Rank

In a store with 20,000 SKUs and a clicks parameter, if half the products only have one click, rank normalization would have half of the rank range representing 1-click products. Cluster Rank creates non uniform groups, such that all 1-click products get the same rank.

Cluster rank normalization is significantly faster than normal rank normalization. This makes the cluster rank normalization a great choice when dealing with large datasets or when values are frequently updated.

Limitations

Cluster Rank only works when there is at least 200 values within a commercial parameter

Proxy

A proxy normalization directly uses the input value as the normalized value.

While any value is accepted, it's recommended that the value falls within the range of [0->1]. This ensures consistency with the normalized values generated for other parameters.