Normalization of Discrete Data

  • Indicator: the encoded string length is equal to n-1 where n is a count of unique values. A separate column is assigned to each unique field value, with the exception of the reference category. Value 1 is recorded in it, whereas all other string values are equal to 0. The reference category is encoded using the string that consists of zeros.
  • Indicator (without reference category): the encoded string length is equal to n. The reference value is not used.
  • Deviation means the same as the "Indicator", but the reference category is encoded using the string that consists of -1.
  • Simple means the same as the "Indicator", but value -1/n is used instead of 0, and value 1-1/n is used instead of 1.
  • Difference: the unique values are ordered, then each unique value is compared with the next one according to the following table:
Unique values Contrast 1 Contrast 2 Contrast 3
U1 vs. U2 U2 vs. U3 U3 vs. U4
U1 (n-1)/n (n-2)/n (n-3)/n
U2 -1/n (n-2)/n (n-3)n
U3 -1/n (n-2)/n (n-3)/n
U4 -1/n -2/n -3/n
  • Backward difference: the unique values are ordered, then each unique value is compared with the previous one according to the following table:
Unique values Contrast 1 Contrast 2 Contrast 3
U1 vs. U2 U2 vs. U3 U3 vs. U4
U1 -(n-1)/n -(n-2)/n -(n-3)/n
U2 1/n -(n-2)/n -(n-3)n
U3 1/n 2/n -(n-3)/n
U4 1/n 2/n 3/n
  • Helmert: the unique values are ordered, then each unique value is compared with the average value from the subsequent values according to the following table:
Unique values Contrast 1 Contrast 2 Contrast 3
U1 vs. U2, U3 and U4 U2 vs. U3 and U4 U3 vs. U4
U1 (n-1)/n 0 0
U2 -1/n (n-2)/n 0
U3 -1/n -1/(n-1) (n-3)/n
U4 -1/n -1/(n-1) -1/(n-2)
  • Reverse Helmert: the unique values are ordered, then each unique value is compared with the average value from the previous values according to the following table:
Unique values Contrast 1 Contrast 2 Contrast 3
U2 vs. U1 U3 vs. U1 and U2 U4 vs. U1, U2 and U3
U1 -1/(n-2) -1/(n-1) -1/n
U2 (n-2)/n -1/(n-1) -1/n
U3 0 (n-2)/n) -1/n
U4 0 0 (n-1)/n
  • Unique value index: each unique value is replaced with the unique value index.

Reference Value

The reference value is used in some normalizers of discrete data. The reference value is selected from the unique values table. There are several options to set the reference value:

  • First listed.
  • Last listed.
  • Rarer.
  • More frequent.
  • Explicitly set.

results matching ""

    No results matching ""