Chapter 11. Rules

A rule is a logical implication that specifies how the content of a predicate can be derived from the contents of other predicates. Rules have the following syntactic form:

Rule = Formula "<-" Formula "." .

The arrow (<-) is an implication operator, read as "is implied by".

The formula on the left-hand-side of the implication operator is referred to as the head of the rule; the one on the right-hand-side is the body.

The meaning of a rule is as follows: if the formula in the body holds, then the formula in the head also holds. (We sometimes say that if the formula in the body holds, then the rule fires.)

The LogicBlox database system makes sure that the implication is satisfied: this is done by populating the predicates that occur in the head of the rule with the appropriate tuples. In other words, the predicates that occur in the head are automatically derived from the predicates that occur in the body (see also Section 11.1.2, “Incremental Maintenance”).

We sometimes informally say that a rule derives into its head predicate(s).

Rules and the quantification of variables

Note

The purpose of this section (which might be skipped on a first reading of this manual) is to bring to light, in an informal fashion, some of the subtleties that have to do with the quantification of variables. The information might make the wording of compiler diagnostics somewhat less mysterious. Not all the rules given in the examples are actually legal LogiQL rules: this is further elaborated at the end of the section.

A rule is a logical implication. A rule such as the following

P <-  Q.

corresponds to the logical statement P is implied by Q.

This statement is false only when Q holds, but P does not hold. So it is equivalent to the statement P or not Q.

For more complicated bodies we must remember to use deMorgan's rules. So, for example,

P <-  Q, R, S.

corresponds to the logical statement P is implied by Q and R and S. This is equivalent to P or not (Q and R and S), which, by deMorgan's rules, is equivalent to P or not Q or not R or not S.

A LogiQL rule does not contain free variables: all the variables are (implicitly) quantified.

In a rule without negations (see the section called “Negation”), all the variables are universally quantified ("for all"). However, a variable that does not occur in the head can be also be thought of as existentially quantified ("there exists").

Example 11.1. Body variables are also existentially quantified

is_grandparent(x) <- parent(x, y), is_parent(y).

The variable y occurs only in the body. The following are two natural readings of the rule:

For all x and all y, if x is a parent of y and y is a parent, then x is a grandparent.

For all x, if there exists a y such that x is a parent of y and y is a parent, then x is a grandparent.

It is easy to show that these two are indeed equivalent. Recall that head is implied by body is equivalent to head or not body, and when a universal quantifier is moved into a negation, it becomes existential: for all x not P holds is the same as not (exists x such that P holds).

The second reading shows that y can be thought of as existentially quantified (and the rule's body is then the scope of the existential quantifier). The variable x is not existentially quantified, because the quantifier For all x cannot be moved into the body: that would leave the occurrence of x in is_granpdarent(x) free (and the two occurrences would no longer refer to the same variable).

Example 11.2. Intuition can be misleading

In many cases it is more natural and less confusing to treat a variable as existentially quantified. For example, the following is an obvious statement: If, for every y that is a child of x, y has no child, then x has no grandchild.

We might be tempted to encode this as the following rule:

has_no_grandchild(x) <- parent(x, y), has_no_child(y).

That would be wrong. What this rule actually encodes is the slightly different statement: For all x and all y, if y is a child of x and y has no child, then x has no grandchild.

Our error becomes more obvious if we read the rule as the equivalent statement: For all x, if there exists a y such that y is a child of x and y has no child, then x has no grandchild.

We can now immediately see that if x has several children, and one of them has no child, it does not necessarily follow that x has no grandchild: x could have a grandchild by one of the other children.

If the body of a rule contains negated formulas, the situation gets a little more complicated. Every variable that has occurrences only inside a negated formula is quantified existentially. The scope of the existential quantifier is the smallest negated formula that contains all the occurrences (but one must be careful to distinguish between occurrences of different variables with the same name: see Example 11.4, “A quantifier limits the scope of a variable”).

Example 11.3. Variables that occur only in the scope of a negation

Given the intuitively obvious meanings of the predicates, the following defines the notion of having a grandchild:

has_grandchild(x) <- parent(x, y), has_child(y).

This can be used to define the notion of not having a grandchild:

has_no_grandchild(x) <- ! has_grandchild(x).

We can now expand the definition of has_grandchild to obtain:

has_no_grandchild(x) <- ! (parent(x, y), has_child(y)).

In this rule the variable y has occurrences only in the scope of the negation. The rule can be read as follows: For any x, x has no grandchild if there does not exist a y such that x is a parent of y and y has a child.

"There does not exist" (or, more precisely, "it is not true that there exists") is a negated existential quantifier. The scope of y is the scope of the quantifier, i.e., in this case the entire negated formula.

Since there are no other variables named y (see Example 11.4, “A quantifier limits the scope of a variable”) we can bring the quantifier outside the negation (thus changing it to a universal quantifier). The rule would now read: For any x, x has no grandchild if, for all y, it is not the case that x is a parent of y and y has a child.

This is equivalent to For any x, x has no grandchild if, for all y, either x is not a parent of y or y has no child.

Or, equivalently (by the definition of implication): For any x, x has no grandchild if, for every y such that x is a parent of y, y has no child. This is essentially the same as the statement with which we began Example 11.2, “Intuition can be misleading”

Example 11.4. A quantifier limits the scope of a variable

We could define has_child as

has_child(x) <- daughter(x, y).
has_child(x) <- son(x, y).

We might want to use this for defining the predicate has_no_child:

has_no_child(x) <- ! daughter(x, y), ! son(x, y).

The rule can be read as: For any x, if there is no y such that y is a daughter of x and there is no y such that y is a son of x, then x has no child.

Please note that there are two existential quantifiers, and each of them quantifies a variable named y. These are actually two different variables, each of them with the name y. The rule might as well have been written as follows:

has_no_child(x) <- ! daughter(x, y), ! son(x, z).

In fact, since both y and z have only one occurrence each, the recommended way to write this is

has_no_child(x) <- ! daughter(x, _), ! son(x, _).

The point of this example is to show that:

  • the scope of a variable may be much smaller than the entire rule, and two occurrences of the same identifier need not refer to the same variable;
  • increasing the scope of a quantifier may require renaming of variables.

Note

The restrictions of LogiQL usually protect the user from most of the subtleties illustrated in this section. In particular, there is a restriction that variables must be positively bound (with the exception noted in the section called “Negation”). This restriction would make it necessary to modify the rules in our examples: all the existential quantifiers could then be regarded as having the same scope (viz., the body of the rule). For example, the rule

has_no_child(x) <- ! daughter(x, y), ! son(x, y).

would have to be rewritten as

has_no_child(x) <- person(x), person(y),
                   ! daughter(x, y), ! son(x, y).

Kinds of rules

There are many ways to divide LogiQL rules into categories. In particular, they can be classified as follows:

  • IDB (intensional database) rules;
  • EDB (extensional database) rules;
  • aggregation rules.

EDB rules are often referred to as active logic, and to understand them one must understand transaction semantics in LogicBlox. We therefore defer their description to Chapter 19, Transaction Logic, in particular to Section 19.2, “Delta logic” and Section 19.3, “Events”.

Aggregation rules are the subject of Chapter 12, Aggregations.

In this chapter we focus on IDB rules.

11.1. Basics of IDB Rules

An IDB rule may contain an arbitrarily complex formula in the body, and a conjunction of at least one atom in the head.

The following example defines IDB rules for computing the ancestral relationship between people.

Example 11.5. Computing ancestors with IDB rules

person(x), person_name(x:n) -> string(n).
parent(x, y) -> person(x), person(y).
ancestor(x, y) -> person(x), person(y).

ancestor(x, y) <- parent(x, y).
ancestor(x, y) <- parent(x, z), ancestor(z, y).

The above example has two rules (in the last two lines).

The first rule reads: if x is a parent of y, then x is an ancestor of y.

The second rule reads: if x is a parent of z, and z is an ancestor of y, then x is an ancestor of y.

A database might contain the following tuples for the predicate parent (we use the name of a person to refer to its entity value):

"Bob" "Jack"
"Bob" "Jill"
"Jack" "Alice"

Given the above, we can expect that evaluating the two rules in Example 11.5, “Computing ancestors with IDB rules” would result in the following tuples in ancestor:

"Bob" "Jack"
"Bob" "Jill"
"Jack" "Alice"
"Bob" "Alice"

Instead of specifying multiple rules whose head atoms refer to the same predicate, we may also specify one rule, with a disjunction in the body. The disjunction operator is the semicolon (;).

Example 11.6. 

The following rule is equvalent to the two rules in Example 11.5, “Computing ancestors with IDB rules”:

ancestor(x, y) <-
    parent(x, y)
  ; parent(x, z), ancestor(z, y).

The rule may be read: If x is a parent of y, or x is a parent of an ancestor of y, then x is an ancestor of y.

Finitely Bound Variables.  It is a requirement that a rule must compute a finite number of tuples. For example, the following is not a valid rule and will be rejected by the compiler:

smaller_than(x, y) -> int(x), int(y).

smaller_than(x, y) <- x < y.

It is clear that smaller_than would contain an infinite number of tuples (or, at least, infinite for all practical purposes). LogiQL does not allow such rules. Typically, rules that do not result in a finite number of tuples will result in compiler error messagess about invalid variable bindings. The compiler determines whether a rule results in a finite number of tuples by analyzing whether each variable that appears in the rule head is finitely bound. In this case, neither x nor y is finitely bound.

The exceptions to this requirement are value-constructing rules (Section 11.2, “Value-constructing Rules”), and derived-only rules (Section 11.3, “Derived-only Rules”).

Positively Bound Variables.  Another requirement is that every variable in the body must be:

  • either positively bound (i.e., with at least one binding occurrence that is not in the scope of a negation: that binding occurrence should be "finite" in the sense described above);
  • or negatively bound in the value position of a functional predicate (see the section called “Negation” for further explanations and an example).

This ensures that the search space for rule evaluation is finite.

11.1.1. Predicate Inference

As a programming convenience, LogiQL infers the existence of predicates from rules (and/or facts). This inference reduces the need for programmers to explicitly declare predicates (as described in Chapter 8, Predicates).

Example 11.7. Inferring predicates

The following program is equivalent to the one in Example 11.5, “Computing ancestors with IDB rules”:

person(x), person_name(x:n) -> string(n).
parent(x, y) -> person(x), person(y).

ancestor(x, y) <- parent(x, y).
ancestor(x, y) <- parent(x, z), ancestor(z, y).

Note that the above does not include explicit declarations of ancestor. From the two rules that derive into ancestor, the compiler infers that ancestor exists, and that its arity is two, with both arguments being of type person.

A special case of such predicate inference is inference from facts, which can be considered as rules with empty bodies. See also Section 10.1.1, “Atoms as facts”.

11.1.2. Incremental Maintenance

The LogicBlox database maintains the consistency of rules automatically, and in an incremental fashion.

Automatic maintenance means that the contents of the predicates named in each rule's head are consistent with the rule and the contents of the predicates mentioned in the rule's body. If the latter change over time, the head predicates will be updated accordingly, with no actions required from the programmer.

Example 11.8. 

Please refer to Example 11.5, “Computing ancestors with IDB rules” and the text that follows it.

If the tuple ("Bob", "Jack") is removed from the predicate parent, the tuples ("Bob", "Jack") and ("Bob", "Alice") will be automatically removed from predicate ancestor.

Similarly, if a tuple is added to parent, then the corresponding tuple(s) will be added to ancestor as well.

Incremental maintenance means that, when there is a change in the contents of predicates mentioned in the bodies of some rules, the predicates mentioned in the heads are not recomputed from scratch. The recomputation effort is proportional to the actual differences in the predicates.

11.2. Value-constructing Rules

Rules can be used not only to derive new tuples from existing values. They can also be used to derive, or construct, new values. Value-constructing rules use constructor predicates (Section 8.3.1, “Constructor Predicates”) to accomplish this.

The following example illustrates the use of a constructor predicate, person_by_name, to construct new person entity values.

Example 11.9.  Constructing person entity values by means of a rule

person(x) -> .
person_by_name[first, last] = p -> string(first), string(last), person(p).
lang:constructor(`person_by_name).

names_input(first, last) -> string(first), string(last).

person(p), person_by_name[first, last] = p <- names_input(first, last).

For each tuple of (first, last) in predicate names_input, the above rule creates exactly one person value.

Note that the rule can also be written in a simplified form:

person_by_name[first, last] = _ <- names_input(first, last).

A variable that occurs in the head of a rule, but not its body, would not be bound (see the section called “Bound variables and their instantiations” and Section 10.1.2, “Atoms in heads of rules”). Instead of just disallowing it, LogiQL gives it a special interpretation: it is treated as an existentially quantified variable that can be instantiated by a constructor to a freshly-created entity (see the example above). Such a variable cannot, of course, be instantiated by two different constructor atoms in the the same rule.

Example 11.10. An illegal value-constructing rule

person_by_ssn[ssn] = p -> string(ssn), person(p).
lang:constructor(`person_by_ssn).

input_data(first, last, ssn) ->
  string(first), string(last), string(ssn).

person(p),
   person_by_name[fn, ln] = p,
   person_by_ssn[ssn] = p <- input_data(fn, ln, ssn).

In the above rule the variable p is shared by two constructor atoms: person_by_name and person_by_ssn. This code is rejected at compile-time.

11.3. Derived-only Rules

By default, when a rule defines what should be derived into a predicate, the result of evaluating the rule is materialized and stored as the content of that predicate. There are two reasons why this might sometimes not be desirable:

  • Performance: The result of the evaluation is used infrequently, or the size of the result is disproportionately large compared to the complexity of the computation. That is, it is cheaper to simply re-evaluate the rule (and derive the contents of a predicate) than it is to store the result.
  • Expressiveness: A rule specifies a computation with infinite results, and thus cannnot be materialized. However, the logic is useful in many contexts. This is often the case when a rule specifies some arithmetic computation.

In either of these situations, the programmer may choose to define derived-only rules, which must derive into derive-only predicates.

The following example illustrates the definition and use of a derived-only predicate and rule.

Example 11.11.  Defining a derived-only predicate for computing the hypotenuse of a right triangle

The predicate hypotenuse takes the length of the two sides of a right triangle, and computes the length of the hypotenuse. We would like to reuse the definition in many different rules. However, the result of the computation cannot possibly be materialized, as it is infinite. We therefore declare hypotenuse to be a derived-only predicate, by using the pragma lang:derivationType.

hypotenuse[x, y] = z -> float(x), float(y), float(z).
lang:derivationType[`hypotenuse] = "Derived".

hypotenuse[x, y] = float:sqrt[x * x + y * y].

The predicate can be used in rules where its key arguments are positively and finitely bound. For instance, we can use hypotenuse in the following rule to help determine which tuples in candidates can be the sides of a right triangle:

candidates(x,y) -> float(x), float(y).

possible_triangles(x, y) <- candidates(x, y), hypotenuse[x, y] = _.

A derived-only rule by itself is never evaluated. It is simply unfolded into the rule in which its derived-only predicate is used.

11.4. Putting it all together: general recursion

We now know the basics of LogiQL. Before moving on to more specialised constructs, let us look at a larger example (which is somewhat silly, but might nevertheless be instructive).

Note

In what follows, we will use recursion, i.e., predicates defined by rules that use the contents of those predicates. The recursion is general : there are no special restrictions put onto the form of the recursive rules. While this works perfectly on simple examples, it might be a very serious drain on resources in the context of a larger application. See Chapter 15, Linear recursion for more information, and for ways of making recursion more efficient in those cases when what we really want is iteration.

Example 11.12. IDB rules and general recursion

To celebrate the 80 birthday of great-grandfather Abe, who is a little heavy but still very fit, the menfolk of the Johnston family decided to go for a hunting trip in Alaska. The trip will involve renting a light plane, so it is necessary to find out the total weight of all the participants.

We start by describing the men of the family, and their relationships. We represent the men not as strings, but as enities (of type man).

  man(m), name(m : x) -> string(x).

  father(f, son) -> man(f), man(son).

  name(_ : "Abe").
  name(_ : "Bob").   name(_ : "Charlie"). name(_ : "Dave").
  name(_ : "Ed").    name(_ : "Fred").    name(_ : "George").
  name(_ : "Henry").                      name(_ : "Ike").    name(_ : "Jim").

  father(a, b),
  father(a, c), father(a, d) <- name(a : "Abe"),     name(b : "Bob"),
                                                     name(c : "Charlie"),
                                                     name(d : "Dave").
  father(b, e)               <- name(b : "Bob"),     name(e : "Ed").
  father(c, f)               <- name(c : "Charlie"), name(f : "Fred").
  father(d, g)               <- name(d : "Dave"),    name(g : "George").
  father(e, h)               <- name(e : "Ed"),      name(h : "Henry").
  father(g, i), father(g, j) <- name(g : "George"),  name(i : "Ike"),
                                                     name(j : "Jim").

The weights of everyone are as follows:

weight[m] = w -> man(m), int(w).
weight[a] = 200 <- name(a : "Abe").
weight[b] = 180 <- name(b : "Bob").
weight[c] = 170 <- name(c : "Charlie").
weight[d] = 160 <- name(d : "Dave").
weight[e] = 160 <- name(e : "Ed").
weight[f] = 150 <- name(f : "Fred").
weight[g] = 140 <- name(g : "George").
weight[h] = 100 <- name(h : "Henry").
weight[i] = 110 <- name(i : "Ike").
weight[j] = 100 <- name(j : "Jim").

We now want to compute, for each man, the combined weight of him and all his descendants. The result for great-grandfather Abe will then be the answer we are looking for.

We declare the functional predicate that maps each man to his combined weight:

// The total weight of `m` and all his descendants.
total_weight[m] = w -> man(m), int(w).

We must now find rules that will populate this predicate with the correct information.

The first step is trivial:

total_weight[m] = weight[m] + weight_of_descendants[m].

// The total weight of all the descendants of `m`.
weight_of_descendants[m] = w -> man(m), int(w).

Now the fun begins: how do we define weight_of_descendants? You are encouraged to try it for yourself before reading on.

For men who are not fathers, this is simple enough:

weight_of_descendants_of[m] = 0 <- man(m), ! father(m, _).

(Notice that we had to use man(m) in order to bind the variable outside the negation: see the section called “Negation”.)

Since none of the men has more than three sons, we might be tempted to write:

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u).

A moment's reflection will show that this logic is flawed: there is nothing to prevent s, t and u from representing the same person, whose weight would thus be counted more than once. Moreover, if m has more than one son, then there are several possible ways of satisfying such a rule, and some of them will give different answers: here the LogicBlox system would help us by complaining about a functional dependency violation (FDV).

We can try to fix this:

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u),
      t != s, u != s, u != t.

The result is that this rule never fires, and we get information only about the total weights of Fred, Ike, Henry and Jim (who have no sons). This is not surprising, since only Abe has three sons, and there is no rule that will compute the total weight for any of them.

A way out of this predicament is to provide a separate rule for each of the remaining cases: one son and two sons. For the case of two sons it would seem that the following should do the trick:

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), s != t,
      ! (father(m, u), u != s, u != t).

The compiler does not allow such a rule (see Example 10.25, “An illegal use of a negatively bound variable”).

An attempt to mechanically rewrite this to

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), s != t,
      ! has_other_sons(m, s, t).

has_other_sons(m, s, t) -> man(m), man(s), man(t).
has_other_sons(m, s, t) <- father(m, u), u != s, u != t.

will bring no immediate joy. The compiler will rightly complain that variable t is unbound (indeed, variable s is unbound, too.)

If you haven't solved the problem already, this might be a good time to experiment with various approaches. We provide just one possible solution:

// The total weight of all the descendants of `m`.
weight_of_descendants_of[m] = w -> man(m), int(w).

weight_of_descendants_of[m] = 0 <- man(m), ! father(m, _).

weight_of_descendants_of[m] = total_weight[s]
   <- father(m, s), ! has_at_least_two_sons(m).

weight_of_descendants_of[m] = total_weight[s] + total_weight[t]
   <- father(m, s), father(m, t), t != s,
      ! has_at_least_three_sons(m).

weight_of_descendants_of[m] = total_weight[s] +
                              total_weight[t] + total_weight[u]
   <- father(m, s), father(m, t), father(m, u),
      t != s, u != s, u != t.


// Does `m` have at least two different sons?
has_at_least_two_sons(m) <- father(m, s), father(m, t), t != s.

// Does `m` have at least three different sons?
has_at_least_three_sons(m) <- father(m, s), father(m, t), father(m, u),
                              t != s, u != s, u != t.