Chapter 17. Predict Functions

LogiQL provides a special form of rule (a predict rule) to evaluate basic Factorization Machine (FaMa) models [Rendle 2010]. Predict rules can also be used to export data in paperboat format, which is the default data format of Infor’s light-weight factorization machine package (a.k.a. lwfm). The full grammar of predict rules is detailed in Section 17.2, “The general form of predict rules”. Restrictions on the elements of predict rules are discussed in great detail in Section 17.3, “Restrictions and requirements”. We conclude the chapter by presenting a number of examples to help the reader familiarize herself with construction of predict rules for several use cases.

Note

The definition of predict rules in Section 17.2, “The general form of predict rules” may seem intimidating. We encourage the reader to examine the examples in order to develop some basic intuitions, and only then to consult Section 17.2, “The general form of predict rules” and Section 17.3, “Restrictions and requirements” for additional clarifications.

17.1. Equations of models supported by predict functions

17.1.1. Factorization Machine (FaMa) model

This is the equation of the basic FaMa model of degree d [Rendle 2010]:

Equation 17.1. Basic Factorization Machine model

where is the total number of input variables (also known as features), is the dimensionality of the factorization (also known as rank) and the model parameters, are the bias , first-order/linear coefficients (alpha) , and l-order coefficients (beta) .

Currently LogiQL supports this general form for d up to 4.

Unless stated otherwise, in this chapter is used to denote the model evaluated by predict functions, and or is used to denote the basic form of the FaMa model.

17.2. The general form of predict rules

The EBNF grammar below is a little more restrictive than necessary. The comma-separated items in ExportParameterSet and EvalParameterSet can be given in any order, but must all be present. The same is true of the items in RequiredConfig (which are separated by vertical bars). The order of elements in the various disjunctions and conjunctions in the body can also be changed.

The named identifiers (e.g., ModelKey) are intended not only to guide the intuition, but often also to indicate that the same identifier is to be used in several places (e.g., the same ModelKey in EvalParameterSet and CoefficientConjunction). Note, however, that this is not always the case: in particular Var stands just for a variable name.

PredictRule = ExportHeadAtom "<-"
                "predict" "<<" ExportParameterSet ">>" ExportBodyConjunction "."
            | EvalHeadAtom "<-"
                "predict" "<<" EvalParameterSet ">>" EvalBodyConjunction "." .

ExportHeadAtom = ExportStatePredicateName "[" "]" "=" BooleanVariable .

EvalHeadAtom =
    EvalResultPredicateName "[" ObservationKey { "," ObservationKey } "]"
      "=" DecimalVariable .

ExportParameterSet = "mode" "=" "export"
                     "," "format"           "=" "paperboat"
                     "," "configurations"   "=" RequiredConfig
                     "," "observation_keys" "=" ObservationKeys
                     "," "feature_key"      "=" FeatureKeyKey
                     "," "feature_value"    "=" FeatureValueKey
                     "," "sales_value"      "=" SalesValueKey .

EvalParameterSet   = "mode" "=" "eval"
                     "," "model"            "=" "multi_model"
                     "," "model_key"        "=" ModelKey
                     "," "coefficients"     "=" CoefficientPredicateName
                     "," "observation_keys" "=" ObservationKeys
                     "," "feature_key"      "=" FeatureKeyKey
                     "," "feature_value"    "=" FeatureValueKey .

RequiredConfig = "{" "file_path" "=" FilePathPredicateName
                 "|" "header"    "=" HeaderPredicateName
                 "|" "weight"    "=" WeightPredicateName
                 "}" .

ObservationKeys = "[" ObservationKey { "|" ObservationKey } "]" .


ExportBodyConjunction = RequiredConfigDisjunction ","
                        ObservationConjunction "," ExportFeatureDisjunction .

EvalBodyConjunction = CoefficientConjunction ","
                      ObservationConjunction "," EvalFeatureDisjunction .

RequiredConfigDisjunction =  "(" FilePathPredicateName "[" "]" "=" Var
                             ";" HeaderPredicateName   "[" "]" "=" Var
                             ";" WeightPredicateName   "[" "]" "=" Var
                             ")" .

CoefficientConjunction =
  CoefficientPredicateName "[" ModelKeyValue "," Var "," Var "," Var "]" "=" Var
  "," DomainPredicateName "(" ObservationKey  { "," ObservationKey } ")"
  "," ( ModelKey "=" ModelKeyValue
      | ModelMappingPredicateName "[" ObservationKey { "," ObservationKey } "]"
          "=" ModelKey
      ).

ExportFeatureDisjunction =
      "(" TargetConjunction
      ";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
      ")" .

EvalFeatureDisjunction =
      "(" FeatureConjunctionFormula
      ";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
      ")" .

ObservationConjunction    = Conjunction .
TargetConjunction         = Conjunction .
FeatureConjunctionFormula = Conjunction .

ModelKeyValue = StringLiteral .

Var                      = Identifier .
BooleanVariable          = Identifier .
DecimalVariable          = Identifier .
FeatureKeyKey            = Identifier .
FeatureValueKey          = Identifier .
SalesValueKey            = Identifier .
ObservationKey           = Identifier .
CoefficientPredicateName = Identifier .
DomainPredicateName      = Identifier .
FilePathPredicateName    = Identifier .
HeaderPredicateName      = Identifier .
WeightPredicateName      = Identifier .

It is worth noting that predict rules enjoy the native parallel support of LogiQL and are incrementally maintained.

Deprecated form of predict rules

There is, however, an obsolete form of predict rules, which, though still supported, are not incrementally maintained. This may result in excessively costly evaluation, so that form of predict rules should be avoided whenever possible.

The syntax of that obsolete form of predict rules is given below for completeness. Notice that in EvalParameterSet we have model = fmdirect or model = lfmdirect instead of model = multi_model .

Note

Support for this form of predict rules will be removed in a future version of LogicBlox.

PredictRuleObsolete =
       ExportHeadAtom "<-"
             "predict" "<<" ExportParameterSet ">>" ExportBodyConjunction "."
     | EvalHeadAtom "<-"
             "predict" "<<" EvalParameterSet ">>" EvalBodyConjunction "." .

ExportHeadAtom = ExportStatePredicateName "[" "]" "=" BooleanVariable .

EvalHeadAtom =
   EvalResultPredicateName "[" ObservationKey { "," ObservationKey } "]" "="
     DecimalVariable .

ExportParameterSet = "mode" "=" "export"
                     "," "format" "=" "paperboat"
                     "," "configurations" "=" RequiredConfig
                     "," "observation_keys" "=" ObservationKeys
                     "," "feature_key" "=" FeatureKeyKey
                     "," "feature_value" "=" FeatureValueKey
                     "," "sales_value" "=" SalesValueKey .

EvalParameterSet = "mode" "=" "eval"
                   "," ModelCoefficients
                   "," "observation_keys" "=" ObservationKeys
                   "," "feature_key" "=" FeatureKeyKey
                   "," "feature_value" "=" FeatureValueKey .

RequiredConfig = "{" "file_path" "=" FilePathPredicateName
                 "|" "header" "=" HeaderPredicateName
                 "|" "weight" "=" WeightPredicateName
                 "}" .

ObservationKeys = "[" ObservationKey { "|" ObservationKey } "]" .

ModelCoefficients = FMModelCoefficients | LFMModelCoefficients .

FMModelCoefficients = "model" "=" "fmdirect"
                      "," "coefficients" "=" "{" CommonCoefficients "}" .

LFMModelCoefficients =
      "model" "=" "lfmdirect"
      "," "coefficients" "=" "{" CommonCoefficients "|" LFMCoefficients "}" .

CommonCoefficients = "bias" "=" BiasPredicateName
                     "|" "alpha" "=" AlphaPredicateName
                     "|" "beta" "=" AlphaPredicateName .

LFMCoefficients = "centroid "=" CentroidPredicateName
                  "|" "alpha" "=" AlphaPredicateName .

ExportBodyConjunction = RequiredConfigDisjunction
                        "," ObservationConjunction
                        "," ExportFeatureDisjunction
                        "(" FeatureDisjunction ";" TargetConjunction ")" .

EvalBodyConjunction = CoefficientDisjunction
                      "," ObservationConjunction
                      "," EvalFeatureDisjunction .

RequiredConfigDisjunction = "(" FilePathPredicateName "[" "]" "=" Var
                            ";" HeaderPredicateName   "[" "]" "=" Var
                            ";" WeightPredicateName   "[" "]" "=" Var
                            ")" .

CoefficientDisjunction =
      "(" BiasPredicateName  "[" [ Var ] "]" "=" Var
      ";" AlphaPredicateName "[" FeatureKey [ "," Var ] "]" "=" Var
      ";" BetaPredicateName  "[" FeatureKey  [ "," Var ] "," Var "]" "=" Var
      { ";" CentroidPredicateName "[" Var "," Var "]" "=" Var }
      ")" .

ExportFeatureDisjunction =
      "(" TargetConjunction
      ";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
      ")" .

EvalFeatureDisjunction =
      "(" FeatureConjunctionFormula
      ";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
      ")" .

ObservationConjunction    = Conjunction .
FeatureConjunctionFormula = Conjunction .

BooleanVariable       = Identifier .
DecimalVariable       = Identifier .
ObservationKey        = Identifier .
FeatureKeyKey         = Identifier .
FeatureValueKey       = Identifier .
SalesValueKey         = Identifier .
FilePathPredicateName = Identifier .
HeaderPredicateName   = Identifier .
WeightPredicateName   = Identifier .
BiasPredicateName     = Identifier .
AlphaPredicateName    = Identifier .
CentroidPredicateName = Identifier .

17.3. Restrictions and requirements

In this section we provide further restrictions on several elements of a predict rule. These restrictions, along with the imposed rigid form of predict rules, reflect either the assumed rule structure in the implementation of predict functions or the requirements of the FaMa model formalism.

The head atom

Only one atom is allowed in the head of a predict rule.

ExportHeadAtom refers to a predicate that contains one value of type boolean: it will be populated either with true or with false, depending on the outcome of rule evaluation.

EvalHeadAtom refers to a functional predicate whose keys are the full set of observation keys, as defined in EvalParameterSet, and whose value is of type decimal. The predict rule will populate this predicate with the value of the model for each observation key tuple defined in ObservationConjunction.

ObservationConjunction

This can be any conjunction expression, including a single atom, as long as it defines a set of tuples in the full set of observation keys only. Each tuple uniquely corresponds to a data point where a supported FaMa variant is evaluated.

FeatureConjunctionFormula

Each FeatureConjunctionFormula must provide a unique functional mapping from a tuple that consists of a subset of the observation keys and the FeatureKeyKey to a FeatureValueKey.

One can use a foreign key, i.e., a key that is neither an observation key nor FeatureKeyKey, as long as all such foreign keys are no longer present in the final mapping. However, if this results in duplicate mappings then one should expect an inconsistent result of the lwfm model evaluation for those observation tuples that contain tuples in the duplicate mappings. Moreover, any value of FeatureKeyKey should not be present in a mapping that results from more than one FeatureConjunctionFormula.

TargetConjunction

All the above-mentioned restrictions for FeatureConjunctionFormula apply here as well.

Additionally:

  • The full set of observation keys should be present in the function mapping.
  • A direct correspondence between SalesValueKey and FeatureValueKey should be a part of the conjunction.
  • The value of FeatureKeyKey should be the same for all tuples resulting from this expression and should only be present in the mapping to SalesValueKey.

Required predicates explained

Here we list a number of predicates that are required, given a certain function of a predict rule. Most of the predicates must be declared and populated and some need only be declared.

Data export in paperboat format: mode = export

  • FilePathPredicate is required and should contain one element of type string, whose value is the name of the file to be exported.
  • HeaderPredicate is required and should contain one element of type string, whose value becomes the header of the exported file.
  • WeightPredicate is required and should contain one element of type float, whose value should be greater than 0.0 and less than or equal to 1.0. All tuples in full observation keys that satisfy ExportBodyConjunction will have a chance of being exported with a probability given by this value. Every tuple or a row in the exported file has a weight that is the inverse of this value.

FaMa model evaluation: mode = eval

  • model = multi-model
    • CoefficientPredicate is required and is often a functional predicate of five arguments.
      • The first argument is a ModelKey of type string.
      • The second argument is also of type string, and its value can be either "bias", or one of "alpha", "beta", "gamma", and "delta": the latter correspond, respectively, to the first-order, second-order, third-order and fourth-order coefficients of the FaMa model. Please note that the model will not be evaluated properly if one does not provide coefficients of all orders lower than the highest order.
      • The third argument should be a FeatureKey, usually of some entity type.
      • The fourth argument is of type string and is used to indicate the index or indices of the remaining dimension of coefficient matrices. Recall from Section 17.1, “Equations of models supported by predict functions” that the second and higher order coefficient matrix V^(l) has a dimensionality of n by k_l. We use FeatureKey to indicate the index of the first dimension of V^(l), and we then assign to the fourth argument the converted integer value of the second index.
      • The fifth argument is the value of the coefficient, of type float.

      Example 17.1. 

      Let us look at an example. A third-order coefficient of a FaMa model V^(3)_f,0=0.5 can be represented as CoefficentPredicateName[ "fmdirect-1", "gamma", f, "0" ] = 0.5, where f is the corresponding value of the FeatureKey of this coefficient. Although "bias" is not associated with any input variable or feature, it must also have the unique value of the third argument (FeatureKey) that is different from any value associated with an input variable. The same FeatureKey value should be used whenever the second argument's value is "bias".

      The value ModelKey must begin with one of the options that indicate the FaMa variant with which coefficients are associated.
      • When the first argument of CoefficientPredicate begins with "fmdirect-order", it represents a FaMa model of up to the fourth order. The second argument can be one of "bias", "alpha", "beta", "gamma", and "delta". The maximum integer value represented by the string value of the fourth argument indicates the rank of the second argument's value. Since there is no rank dimension for bias and first order coefficients, the value of the fourth argument should be "" whenever the second argument is "bias" or "alpha". When the second argument is "beta", "gamma", or "delta", then the fourth argument can be an integer value in the range from 0 to k_(d-1) where d is 2, 3 or 4.
      • When the first argument of CoefficientPredicate begins with "fmdirect", it represents a FaMa model of up to the second order. The second argument can be "bias", "alpha" or "beta". It provides the same evaluation value as "fmdirect-order" of any FaMa model of second order and lower, although it uses a different implementation. We only keep it for backward compatibility.
    • ModelMappingPredicate is not required but can be used conveniently to evaluate different models for different groups of observation tuples. Any observation tuple must be mapped to one and only one value of ModelKey.
  • model = fmdirect is used to evaluate a FaMa model of up to the second degree. This form of predict rules is deprecated.
    • BiasPredicate contains one value of type float, which is the bias coefficient.
    • AlphaPredicate is a functional predicate keyed by FeatureKey and its value of type float is the first order coefficient of a given input variable or feature.
    • BetaPredicate is also a functional predicate with two keys and its value of type float is the second order coefficient of a given input variable or feature and of a given latent dimension. The first argument is FeatureKey. The second argument is usually of type int and its value maps to the index of the latent dimension of the second order coefficients.
  • model = lfmdirect This form of predict rules is deprecated.
    • BiasPredicate is a functional predicate with one key of type int and a value of type float. It allows one to retrieve the value of the bias coefficient of a FaMa model given the index as the key.
    • AlphaPredicate should have two keys. The first one is still FeatureKey and the second one is the FaMa model index.
    • BetaPredicate should have three keys. The new key, i.e., the FaMa model index is in the second position.

17.4. Examples

Use the following two pragmas with a predict rule to avoid compiler errors:

lang:compiler:disableError:AGG_DISJ[] = true.
lang:compiler:disableWarning:AGG_DISJ[] = true.

Example 17.2. Export

exec <doc>
  +header[] = "Dummy header".
  +sampling_fraction[] = 1f.
  +filePath[] = "training-data-1.txt".
  _test[] = b -> boolean(b).

  lang:compiler:disableError:AGG_DISJ[] = true.
  lang:compiler:disableWarning:AGG_DISJ[] = true.
  lang:compiler:txnLifetimePulse[] = false.

  /* We allow `f` to be used in every feature conjunction */
  +_test[] = v <-
   predict <<
      mode = export,
      format = paperboat,
      configurations = { file_path = filePath
                       | header    = header
                       | weight    = sampling_fraction },
      observation_keys = [sku | loc | day],
      feature_key = nr,
      feature_value = val,
      sales_value = sales
   >>
    (
       +filePath[] = _;
       +header[] = _;
       +sampling_fraction[] = _
    ),
    observable(sku, loc, day),
    dummy(sku, a),
    (
      sku_has_feature[sku, f] = val,
      featureFromID[nr] = f
      ;
      sales[sku, loc, day] = sales,
      val = sales,
      f = target[],
      featureFromID[nr] = f
    ).
</doc>

Example 17.3. FaMa

create --unique

addblock <doc>
  #idb fcst[s, l, d] = v -> string(s), string(l), string(d), decimal(v).

  #edb coefficient[m, p, c, s] = v -> string(m), string(p), int(c), string(s),
                                      float(v).

  #edb fcst_domain(s, l, d) -> string(s), string(l), string(d).

  #edb model_name[i] = s -> int(i), string(s).

  #edb loc2model[s] = m -> string(s), string(m).

  #edb sku_has_feature[s, i] = f -> string(s), int(i), float(f).

  #edb loc_has_feature[s, i] = f -> string(s), int(i), float(f).
</doc>

addblock <doc>
  lang:compiler:disableError:AGG_DISJ[] = true.
  lang:compiler:disableWarning:AGG_DISJ[] = true.

  fcst[sku, loc, day] = v <-
   predict <<
      mode = eval,
      model = multi_model,
      model_key = model,
      coefficients = coefficient,
      observation_keys = [sku | loc | day],
      feature_key = nr,
      feature_value = val
   >>
    coefficient[model, _, _, _] = val,
    fcst_domain(sku, loc, day),
    loc2model[loc] = model,
    (
      sku_has_feature[sku, nr] = val ;
      loc_has_feature[loc, nr] = val
    ).
</doc>

exec <doc>
  +model_name[0] = "fmdirect-order-0".
  +model_name[1] = "fmdirect-order-1".
</doc>

exec <doc>
  +coefficient[model_name[0], "bias", 0, ""] = 20.501f.
  +coefficient[model_name[1], "bias", 0, ""] = 10.2f.

  +coefficient[model_name[0], "alpha", 5 , ""] = 1.1f.
  +coefficient[model_name[0], "alpha", 7 , ""] = 1.2f.
  +coefficient[model_name[0], "alpha", 10, ""] = 2.2f.
  +coefficient[model_name[0], "alpha", 12, ""] = 3.5f.
  +coefficient[model_name[1], "alpha", 5 , ""] = 1f.
  +coefficient[model_name[1], "alpha", 7 , ""] = 2.3f.
  +coefficient[model_name[1], "alpha", 10, ""] = 5.9f.
  +coefficient[model_name[1], "alpha", 12, ""] = 4.1f.

  +coefficient[model_name[0], "beta", 5,  "0"] = 3f.
  +coefficient[model_name[0], "beta", 5,  "1"] = 2f.
  +coefficient[model_name[0], "beta", 7,  "0"] = 10.1f.
  +coefficient[model_name[0], "beta", 7,  "1"] = 2.1f.
  +coefficient[model_name[0], "beta", 10, "0"] = 1.1f.
  +coefficient[model_name[0], "beta", 10, "1"] = 7.1f.
  +coefficient[model_name[0], "beta", 12, "0"] = 3.1f.
  +coefficient[model_name[0], "beta", 12, "1"] = 5.1f.

  +coefficient[model_name[1], "beta", 5,  "0"] = 2.1f.
  +coefficient[model_name[1], "beta", 5,  "1"] = 1.1f.
  +coefficient[model_name[1], "beta", 7,  "0"] = 4.1f.
  +coefficient[model_name[1], "beta", 7,  "1"] = 1.1f.
  +coefficient[model_name[1], "beta", 10, "0"] = 3.1f.
  +coefficient[model_name[1], "beta", 10, "1"] = 4.9f.
  +coefficient[model_name[1], "beta", 12, "0"] = 8.7f.
  +coefficient[model_name[1], "beta", 12, "1"] = 6.3f.

  +sku_has_feature["sku2",  10] = 2.1f.
  +sku_has_feature["sku2", 500] = 2.1f.
  +loc_has_feature["loc1",  12] = 3.4f.
  +loc_has_feature["loc1", 200] = 3.4f.
</doc>

echo "setting domain"

exec <doc>
  +fcst_domain("sku1", "loc1", "day3").
  +fcst_domain("sku1", "loc2", "day1").
  +fcst_domain("sku1", "loc2", "day3").
  +fcst_domain("sku2", "loc2", "day2").
  +fcst_domain("sku2", "loc2", "day3").

  +loc2model["loc1"] = model_name[0].
  +loc2model["loc2"] = model_name[1].
</doc>
print fcst

close --destroy