## Chapter 17. Predict Functions

LogiQL provides a special form of rule (a predict rule) to evaluate basic Factorization Machine (FaMa) models [Rendle 2010]. Predict rules can also be used to export data in paperboat format, which is the default data format of Infor’s light-weight factorization machine package (a.k.a. lwfm). The full grammar of predict rules is detailed in Section 17.2, “The general form of predict rules”. Restrictions on the elements of predict rules are discussed in great detail in Section 17.3, “Restrictions and requirements”. We conclude the chapter by presenting a number of examples to help the reader familiarize herself with construction of predict rules for several use cases.

### Note

The definition of predict rules in Section 17.2, “The general form of predict rules” may seem intimidating. We encourage the reader to examine the examples in order to develop some basic intuitions, and only then to consult Section 17.2, “The general form of predict rules” and Section 17.3, “Restrictions and requirements” for additional clarifications.

## 17.1. Equations of models supported by predict functions

### 17.1.1. Factorization Machine (FaMa) model

This is the equation of the basic FaMa model of degree `d` [Rendle 2010]:

Equation 17.1. Basic Factorization Machine model where is the total number of input variables (also known as features), is the dimensionality of the factorization (also known as rank) and the model parameters, are the bias , first-order/linear coefficients (alpha) , and `l`-order coefficients (beta) .

Currently LogiQL supports this general form for `d` up to 4.

Unless stated otherwise, in this chapter is used to denote the model evaluated by predict functions, and or is used to denote the basic form of the FaMa model.

## 17.2. The general form of predict rules

The EBNF grammar below is a little more restrictive than necessary. The comma-separated items in `ExportParameterSet` and `EvalParameterSet` can be given in any order, but must all be present. The same is true of the items in `RequiredConfig` (which are separated by vertical bars). The order of elements in the various disjunctions and conjunctions in the body can also be changed.

The named identifiers (e.g., `ModelKey`) are intended not only to guide the intuition, but often also to indicate that the same identifier is to be used in several places (e.g., the same `ModelKey` in `EvalParameterSet` and `CoefficientConjunction`). Note, however, that this is not always the case: in particular `Var` stands just for a variable name.

```PredictRule = ExportHeadAtom "<-"
"predict" "<<" ExportParameterSet ">>" ExportBodyConjunction "."
"predict" "<<" EvalParameterSet ">>" EvalBodyConjunction "." .

ExportHeadAtom = ExportStatePredicateName "[" "]" "=" BooleanVariable .

EvalResultPredicateName "[" ObservationKey { "," ObservationKey } "]"
"=" DecimalVariable .

ExportParameterSet = "mode" "=" "export"
"," "format"           "=" "paperboat"
"," "configurations"   "=" RequiredConfig
"," "observation_keys" "=" ObservationKeys
"," "feature_key"      "=" FeatureKeyKey
"," "feature_value"    "=" FeatureValueKey
"," "sales_value"      "=" SalesValueKey .

EvalParameterSet   = "mode" "=" "eval"
"," "model"            "=" "multi_model"
"," "model_key"        "=" ModelKey
"," "coefficients"     "=" CoefficientPredicateName
"," "observation_keys" "=" ObservationKeys
"," "feature_key"      "=" FeatureKeyKey
"," "feature_value"    "=" FeatureValueKey .

RequiredConfig = "{" "file_path" "=" FilePathPredicateName
"|" "weight"    "=" WeightPredicateName
"}" .

ObservationKeys = "[" ObservationKey { "|" ObservationKey } "]" .

ExportBodyConjunction = RequiredConfigDisjunction ","
ObservationConjunction "," ExportFeatureDisjunction .

EvalBodyConjunction = CoefficientConjunction ","
ObservationConjunction "," EvalFeatureDisjunction .

RequiredConfigDisjunction =  "(" FilePathPredicateName "[" "]" "=" Var
";" HeaderPredicateName   "[" "]" "=" Var
";" WeightPredicateName   "[" "]" "=" Var
")" .

CoefficientConjunction =
CoefficientPredicateName "[" ModelKeyValue "," Var "," Var "," Var "]" "=" Var
"," DomainPredicateName "(" ObservationKey  { "," ObservationKey } ")"
"," ( ModelKey "=" ModelKeyValue
| ModelMappingPredicateName "[" ObservationKey { "," ObservationKey } "]"
"=" ModelKey
).

ExportFeatureDisjunction =
"(" TargetConjunction
";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
")" .

EvalFeatureDisjunction =
"(" FeatureConjunctionFormula
";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
")" .

ObservationConjunction    = Conjunction .
TargetConjunction         = Conjunction .
FeatureConjunctionFormula = Conjunction .

ModelKeyValue = StringLiteral .

Var                      = Identifier .
BooleanVariable          = Identifier .
DecimalVariable          = Identifier .
FeatureKeyKey            = Identifier .
FeatureValueKey          = Identifier .
SalesValueKey            = Identifier .
ObservationKey           = Identifier .
CoefficientPredicateName = Identifier .
DomainPredicateName      = Identifier .
FilePathPredicateName    = Identifier .
HeaderPredicateName      = Identifier .
WeightPredicateName      = Identifier .```

It is worth noting that predict rules enjoy the native parallel support of LogiQL and are incrementally maintained.

### Deprecated form of predict rules

There is, however, an obsolete form of predict rules, which, though still supported, are not incrementally maintained. This may result in excessively costly evaluation, so that form of predict rules should be avoided whenever possible.

The syntax of that obsolete form of predict rules is given below for completeness. Notice that in `EvalParameterSet` we have `model = fmdirect` or `model = lfmdirect` instead of `model = multi_model` .

### Note

Support for this form of predict rules will be removed in a future version of LogicBlox.

```PredictRuleObsolete =
"predict" "<<" ExportParameterSet ">>" ExportBodyConjunction "."
"predict" "<<" EvalParameterSet ">>" EvalBodyConjunction "." .

ExportHeadAtom = ExportStatePredicateName "[" "]" "=" BooleanVariable .

EvalResultPredicateName "[" ObservationKey { "," ObservationKey } "]" "="
DecimalVariable .

ExportParameterSet = "mode" "=" "export"
"," "format" "=" "paperboat"
"," "configurations" "=" RequiredConfig
"," "observation_keys" "=" ObservationKeys
"," "feature_key" "=" FeatureKeyKey
"," "feature_value" "=" FeatureValueKey
"," "sales_value" "=" SalesValueKey .

EvalParameterSet = "mode" "=" "eval"
"," ModelCoefficients
"," "observation_keys" "=" ObservationKeys
"," "feature_key" "=" FeatureKeyKey
"," "feature_value" "=" FeatureValueKey .

RequiredConfig = "{" "file_path" "=" FilePathPredicateName
"|" "weight" "=" WeightPredicateName
"}" .

ObservationKeys = "[" ObservationKey { "|" ObservationKey } "]" .

ModelCoefficients = FMModelCoefficients | LFMModelCoefficients .

FMModelCoefficients = "model" "=" "fmdirect"
"," "coefficients" "=" "{" CommonCoefficients "}" .

LFMModelCoefficients =
"model" "=" "lfmdirect"
"," "coefficients" "=" "{" CommonCoefficients "|" LFMCoefficients "}" .

CommonCoefficients = "bias" "=" BiasPredicateName
"|" "alpha" "=" AlphaPredicateName
"|" "beta" "=" AlphaPredicateName .

LFMCoefficients = "centroid "=" CentroidPredicateName
"|" "alpha" "=" AlphaPredicateName .

ExportBodyConjunction = RequiredConfigDisjunction
"," ObservationConjunction
"," ExportFeatureDisjunction
"(" FeatureDisjunction ";" TargetConjunction ")" .

EvalBodyConjunction = CoefficientDisjunction
"," ObservationConjunction
"," EvalFeatureDisjunction .

RequiredConfigDisjunction = "(" FilePathPredicateName "[" "]" "=" Var
";" HeaderPredicateName   "[" "]" "=" Var
";" WeightPredicateName   "[" "]" "=" Var
")" .

CoefficientDisjunction =
"(" BiasPredicateName  "[" [ Var ] "]" "=" Var
";" AlphaPredicateName "[" FeatureKey [ "," Var ] "]" "=" Var
";" BetaPredicateName  "[" FeatureKey  [ "," Var ] "," Var "]" "=" Var
{ ";" CentroidPredicateName "[" Var "," Var "]" "=" Var }
")" .

ExportFeatureDisjunction =
"(" TargetConjunction
";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
")" .

EvalFeatureDisjunction =
"(" FeatureConjunctionFormula
";" FeatureConjunctionFormula { ";" FeatureConjunctionFormula }
")" .

ObservationConjunction    = Conjunction .
FeatureConjunctionFormula = Conjunction .

BooleanVariable       = Identifier .
DecimalVariable       = Identifier .
ObservationKey        = Identifier .
FeatureKeyKey         = Identifier .
FeatureValueKey       = Identifier .
SalesValueKey         = Identifier .
FilePathPredicateName = Identifier .
HeaderPredicateName   = Identifier .
WeightPredicateName   = Identifier .
BiasPredicateName     = Identifier .
AlphaPredicateName    = Identifier .
CentroidPredicateName = Identifier .
```

## 17.3. Restrictions and requirements

In this section we provide further restrictions on several elements of a predict rule. These restrictions, along with the imposed rigid form of predict rules, reflect either the assumed rule structure in the implementation of predict functions or the requirements of the FaMa model formalism.

### The head atom

Only one atom is allowed in the head of a predict rule.

`ExportHeadAtom` refers to a predicate that contains one value of type `boolean`: it will be populated either with `true` or with `false`, depending on the outcome of rule evaluation.

`EvalHeadAtom` refers to a functional predicate whose keys are the full set of observation keys, as defined in `EvalParameterSet`, and whose value is of type `decimal`. The predict rule will populate this predicate with the value of the model for each observation key tuple defined in `ObservationConjunction`.

### ObservationConjunction

This can be any conjunction expression, including a single atom, as long as it defines a set of tuples in the full set of observation keys only. Each tuple uniquely corresponds to a data point where a supported FaMa variant is evaluated.

### FeatureConjunctionFormula

Each `FeatureConjunctionFormula` must provide a unique functional mapping from a tuple that consists of a subset of the observation keys and the `FeatureKeyKey` to a `FeatureValueKey`.

One can use a foreign key, i.e., a key that is neither an observation key nor `FeatureKeyKey`, as long as all such foreign keys are no longer present in the final mapping. However, if this results in duplicate mappings then one should expect an inconsistent result of the lwfm model evaluation for those observation tuples that contain tuples in the duplicate mappings. Moreover, any value of `FeatureKeyKey` should not be present in a mapping that results from more than one `FeatureConjunctionFormula`.

### TargetConjunction

All the above-mentioned restrictions for `FeatureConjunctionFormula` apply here as well.

• The full set of observation keys should be present in the function mapping.
• A direct correspondence between `SalesValueKey` and `FeatureValueKey` should be a part of the conjunction.
• The value of `FeatureKeyKey` should be the same for all tuples resulting from this expression and should only be present in the mapping to `SalesValueKey`.

### Required predicates explained

Here we list a number of predicates that are required, given a certain function of a predict rule. Most of the predicates must be declared and populated and some need only be declared.

Data export in paperboat format: `mode = export`

• `FilePathPredicate` is required and should contain one element of type `string`, whose value is the name of the file to be exported.
• `HeaderPredicate` is required and should contain one element of type `string`, whose value becomes the header of the exported file.
• `WeightPredicate` is required and should contain one element of type `float`, whose value should be greater than `0.0` and less than or equal to `1.0`. All tuples in full observation keys that satisfy `ExportBodyConjunction` will have a chance of being exported with a probability given by this value. Every tuple or a row in the exported file has a weight that is the inverse of this value.

FaMa model evaluation: `mode = eval`

• `model = multi-model`
• `CoefficientPredicate` is required and is often a functional predicate of five arguments.
• The first argument is a `ModelKey` of type `string`.
• The second argument is also of type `string`, and its value can be either `"bias"`, or one of `"alpha"`, `"beta"`, `"gamma"`, and `"delta"`: the latter correspond, respectively, to the first-order, second-order, third-order and fourth-order coefficients of the FaMa model. Please note that the model will not be evaluated properly if one does not provide coefficients of all orders lower than the highest order.
• The third argument should be a `FeatureKey`, usually of some entity type.
• The fourth argument is of type `string` and is used to indicate the index or indices of the remaining dimension of coefficient matrices. Recall from Section 17.1, “Equations of models supported by predict functions” that the second and higher order coefficient matrix `V^(l)` has a dimensionality of `n` by `k_l`. We use `FeatureKey` to indicate the index of the first dimension of `V^(l)`, and we then assign to the fourth argument the converted integer value of the second index.
• The fifth argument is the value of the coefficient, of type `float`.

Example 17.1.

Let us look at an example. A third-order coefficient of a FaMa model `V^(3)_f,0=0.5` can be represented as ```CoefficentPredicateName[ "fmdirect-1", "gamma", f, "0" ] = 0.5```, where `f` is the corresponding value of the `FeatureKey` of this coefficient. Although "bias" is not associated with any input variable or feature, it must also have the unique value of the third argument (`FeatureKey`) that is different from any value associated with an input variable. The same `FeatureKey` value should be used whenever the second argument's value is `"bias"`.

The value `ModelKey` must begin with one of the options that indicate the FaMa variant with which coefficients are associated.
• When the first argument of `CoefficientPredicate` begins with `"fmdirect-order"`, it represents a FaMa model of up to the fourth order. The second argument can be one of `"bias"`, `"alpha"`, `"beta"`, `"gamma"`, and `"delta"`. The maximum integer value represented by the string value of the fourth argument indicates the rank of the second argument's value. Since there is no rank dimension for bias and first order coefficients, the value of the fourth argument should be `""` whenever the second argument is `"bias"` or `"alpha"`. When the second argument is `"beta"`, `"gamma"`, or `"delta"`, then the fourth argument can be an integer value in the range from 0 to k_(d-1) where d is 2, 3 or 4.
• When the first argument of `CoefficientPredicate` begins with `"fmdirect"`, it represents a FaMa model of up to the second order. The second argument can be `"bias"`, `"alpha"` or `"beta"`. It provides the same evaluation value as `"fmdirect-order"` of any FaMa model of second order and lower, although it uses a different implementation. We only keep it for backward compatibility.
• `ModelMappingPredicate` is not required but can be used conveniently to evaluate different models for different groups of observation tuples. Any observation tuple must be mapped to one and only one value of `ModelKey`.
• `model = fmdirect` is used to evaluate a FaMa model of up to the second degree. This form of predict rules is deprecated.
• `BiasPredicate` contains one value of type `float`, which is the bias coefficient.
• `AlphaPredicate` is a functional predicate keyed by `FeatureKey` and its value of type `float` is the first order coefficient of a given input variable or feature.
• `BetaPredicate` is also a functional predicate with two keys and its value of type `float` is the second order coefficient of a given input variable or feature and of a given latent dimension. The first argument is `FeatureKey`. The second argument is usually of type `int` and its value maps to the index of the latent dimension of the second order coefficients.
• `model = lfmdirect` This form of predict rules is deprecated.
• `BiasPredicate` is a functional predicate with one key of type `int` and a value of type `float`. It allows one to retrieve the value of the bias coefficient of a FaMa model given the index as the key.
• `AlphaPredicate` should have two keys. The first one is still `FeatureKey` and the second one is the FaMa model index.
• `BetaPredicate` should have three keys. The new key, i.e., the FaMa model index is in the second position.

## 17.4. Examples

Use the following two pragmas with a predict rule to avoid compiler errors:

```lang:compiler:disableError:AGG_DISJ[] = true.
lang:compiler:disableWarning:AGG_DISJ[] = true.```

Example 17.2. Export

```exec <doc>
+sampling_fraction[] = 1f.
+filePath[] = "training-data-1.txt".
_test[] = b -> boolean(b).

lang:compiler:disableError:AGG_DISJ[] = true.
lang:compiler:disableWarning:AGG_DISJ[] = true.

/* We allow `f` to be used in every feature conjunction */
+_test[] = v <-
predict <<
mode = export,
format = paperboat,
configurations = { file_path = filePath
| weight    = sampling_fraction },
observation_keys = [sku | loc | day],
feature_key = nr,
feature_value = val,
sales_value = sales
>>
(
+filePath[] = _;
+sampling_fraction[] = _
),
observable(sku, loc, day),
dummy(sku, a),
(
sku_has_feature[sku, f] = val,
featureFromID[nr] = f
;
sales[sku, loc, day] = sales,
val = sales,
f = target[],
featureFromID[nr] = f
).
</doc>```

Example 17.3. FaMa

```create --unique

#idb fcst[s, l, d] = v -> string(s), string(l), string(d), decimal(v).

#edb coefficient[m, p, c, s] = v -> string(m), string(p), int(c), string(s),
float(v).

#edb fcst_domain(s, l, d) -> string(s), string(l), string(d).

#edb model_name[i] = s -> int(i), string(s).

#edb loc2model[s] = m -> string(s), string(m).

#edb sku_has_feature[s, i] = f -> string(s), int(i), float(f).

#edb loc_has_feature[s, i] = f -> string(s), int(i), float(f).
</doc>

lang:compiler:disableError:AGG_DISJ[] = true.
lang:compiler:disableWarning:AGG_DISJ[] = true.

fcst[sku, loc, day] = v <-
predict <<
mode = eval,
model = multi_model,
model_key = model,
coefficients = coefficient,
observation_keys = [sku | loc | day],
feature_key = nr,
feature_value = val
>>
coefficient[model, _, _, _] = val,
fcst_domain(sku, loc, day),
loc2model[loc] = model,
(
sku_has_feature[sku, nr] = val ;
loc_has_feature[loc, nr] = val
).
</doc>

exec <doc>
+model_name = "fmdirect-order-0".
+model_name = "fmdirect-order-1".
</doc>

exec <doc>
+coefficient[model_name, "bias", 0, ""] = 20.501f.
+coefficient[model_name, "bias", 0, ""] = 10.2f.

+coefficient[model_name, "alpha", 5 , ""] = 1.1f.
+coefficient[model_name, "alpha", 7 , ""] = 1.2f.
+coefficient[model_name, "alpha", 10, ""] = 2.2f.
+coefficient[model_name, "alpha", 12, ""] = 3.5f.
+coefficient[model_name, "alpha", 5 , ""] = 1f.
+coefficient[model_name, "alpha", 7 , ""] = 2.3f.
+coefficient[model_name, "alpha", 10, ""] = 5.9f.
+coefficient[model_name, "alpha", 12, ""] = 4.1f.

+coefficient[model_name, "beta", 5,  "0"] = 3f.
+coefficient[model_name, "beta", 5,  "1"] = 2f.
+coefficient[model_name, "beta", 7,  "0"] = 10.1f.
+coefficient[model_name, "beta", 7,  "1"] = 2.1f.
+coefficient[model_name, "beta", 10, "0"] = 1.1f.
+coefficient[model_name, "beta", 10, "1"] = 7.1f.
+coefficient[model_name, "beta", 12, "0"] = 3.1f.
+coefficient[model_name, "beta", 12, "1"] = 5.1f.

+coefficient[model_name, "beta", 5,  "0"] = 2.1f.
+coefficient[model_name, "beta", 5,  "1"] = 1.1f.
+coefficient[model_name, "beta", 7,  "0"] = 4.1f.
+coefficient[model_name, "beta", 7,  "1"] = 1.1f.
+coefficient[model_name, "beta", 10, "0"] = 3.1f.
+coefficient[model_name, "beta", 10, "1"] = 4.9f.
+coefficient[model_name, "beta", 12, "0"] = 8.7f.
+coefficient[model_name, "beta", 12, "1"] = 6.3f.

+sku_has_feature["sku2",  10] = 2.1f.
+sku_has_feature["sku2", 500] = 2.1f.
+loc_has_feature["loc1",  12] = 3.4f.
+loc_has_feature["loc1", 200] = 3.4f.
</doc>

echo "setting domain"

exec <doc>
+fcst_domain("sku1", "loc1", "day3").
+fcst_domain("sku1", "loc2", "day1").
+fcst_domain("sku1", "loc2", "day3").
+fcst_domain("sku2", "loc2", "day2").
+fcst_domain("sku2", "loc2", "day3").

+loc2model["loc1"] = model_name.
+loc2model["loc2"] = model_name.
</doc>
print fcst

close --destroy
```