Chapter 12. Aggregations

An aggregation rule applies a function across all values selected by a given formula. Aggregations have the following syntactic form:

Aggregation =
   Formula "<-" "agg" "<<" AggregationSpecifiers ">>" Formula .

AggregationSpecifiers = AggregationSpecifier { "," AggregationSpecifier }.

AggregationSpecifier =
   OutputVariable "=" Identifier "(" [ InputVariable ] ")".

OutputVariable = Identifier.
InputVariable  = Identifier.

All aggregation specifiers have a common format: an output variable followed by = and a function whose argument is an input variable (except for count, which has no argument). The input variable should be instantiated by the formula that follows the aggregation specifier.

It is worth noting that:

  • The syntax of an aggregation is such that everything that follows the >> is treated as a single formula. The formula cannot contain a disjunction. (See Example 12.5, “No disjunctions in an aggregated formula”.)

  • The collections associated with input variables are not their instantiations in the proper sense of the word, and in particular they need not be sets. They are computed on the fly, which is much cheaper than materialising them and removing duplicates. See Example 12.4, “There is no projection onto input variables”.

  • The formula on the left-hand side must be a conjunction of single atoms. For example, the following rule:

    total_weight[m] = weight[m] + ws
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].

    will cause a compilation error with the following diagnostic message:

    error: '(weight[m] + ws)' is a complex expression which is not allowed in the head of an aggregation (code: COMPLEX_EXP_IN_AGG_HEAD)
          total_weight[m] = weight[m] + ws
          ^^^^^^^^^^^^^^ 
  • A variable that is used as the output of an aggregation must obey certain restrictions. In particular, in the head of the rule it can appear only as a value argument of a single-valued functional predicate. If there are more atoms in the head, each of them must be a single-valued functional predicate, and each must have the output variable as the value argument.

    For example, each of the following two rules is illegal:

    total_weight[m] = w, other_weight[s] = ws
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].
    total_weight[m] = ws, other_weight(s, ws)
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].

    The following would be accepted, because ws is always in a value position in both atoms:

    total_weight[m] = ws, other_weight(s; ws)
       <- agg<< ws = total(w) >> father(m, s), w = weight[s].
  • An aggregation rule cannot be recursive (directly or indirectly). For example, in Example 11.12, “IDB rules and general recursion” one might want to write the rule

    total_weight[m] = ws
       <- agg<< ws = total(w) >> father(m, s), w = total_weight[s].

    This would result in an error message beginning with:

    Error: Recursion through aggregation (AGGREGATE_RECURSION) in rule:
    
    Forall m::man,s::man,w::int,ws::int .
    total_weight[m]=ws <-
       agg<<ws = total(w)>>
          father(m,s),
          total_weight[s]=w.

Aggregation specifiers

The accepted forms of AggregationSpecifier are as follows:

V = count()

Counts the number of tuples in the collection defined by the given formula. If the collection is empty, then the aggregation doesn't calculate anything (i.e., it fails), so the calculated count will never be 0.

The parentheses may be omitted (unless the specifier is followed by another one, see Example 12.3, “Multiple specifiers”).

V1 = total(V2)

Calculates the sum, V1, of the collection of values identified by V2. However, the collection of tuples computed in the body is not projected onto V2: see Example 12.4, “There is no projection onto input variables”.

The items in the collection must be of a numeric type.

V1 = min(V2)

Calculates the minimum value, V1, of the collection of values identified by V2.

The items in the collection must be of an ordered type (note that false < true, and strings are ordered lexicographically).

V1 = max(V2)

Calculates the maximum value, V1, of the collection of values identified by V2.

The items in the collection must be of an ordered type (note that false < true, and strings are ordered lexicographically).

Example 12.1. A total aggregation

The following is an aggregation that sums over the the values in input, and stores the resulting value in result.

input(x) -> int(x).

result[] = x <- agg<<x = total(y)>> input(y).

Example 12.2. Counting the tuples

The following counts the number of tuples in predicate p:

count_p[] = z <- agg << z = count() >> p(_).

Recall that LogiQL has set semantics (unlike SQL, which has multiset/bag semantics), so all the tuples in a predicate are different from one another. If predicate p were given by the following facts,

p(1).
p(2).
p(3).
p(2).
p(1).

then the value of count_p[] would be 3.

Example 12.3. Multiple specifiers

A single aggregation can contain more than one specifier, as in the following query (which uses only local predicates, see Section 8.10, “Local predicates”):

query <doc>
  _p(0). _p(2). _p(4). _p(6).

  _c_p[] = c, _s_p[] = s, _m_p[] = m
     <- agg << c = count(), s = total(x), m = max(x) >> _p(x).

  _(c, s, m) <- c = _c_p[], s = _s_p[], m = _m_p[].
</doc>

(Note that in such a context count without parentheses would cause a syntactic error.)

The result will be:

/--------------- _ ---------------\
4 12 6
\--------------- _ ---------------/

Example 12.4. There is no projection onto input variables

The syntax of an aggregation is such that everything that follows the >> is treated as a single formula. So, for example,

sum[] = z <- agg<< z = total(x) >> p(x), q(_).

is equivalent to

sum[] = z <- agg<< z = total(x) >> (p(x), q(_)).

If p and q are given by

p(0).  p(1).  p(2).
q("ab").  q("abc").

then the value of sum[] will be 6, because the formula will produce the collection

{ (0, "ab"),  (0, "abc"),
  (1, "ab"),  (1, "abc"),
  (2, "ab"),  (2, "abc") } 

The summation is carried out over the first elements of the tuples, without first calculating a projection, which would be the set {0, 1, 2}.

Example 12.5. No disjunctions in an aggregated formula

Since an aggregation is computed on the fly, the associated formula must be a straightforward conjunction. For example, the rule

count_p3[] = z <- agg << z = count >> p(), (p(_); p(_)).

would be rejected by the compiler, with the message

error: disjunction is not supported in aggregation.  Try defining a separate predicate with the disjunction and use that predicate in this aggregation. (code: AGG_DISJ)

Example 12.6. A restriction on the output of aggregation

The following attempt to count the tuples in p would not be accepted:

count_p(z) <- agg<< z = count >> p(_).

The compiler would diagnose this rule as follows:

the aggregation output 'z' appears as a key in this predicate (code: AGG_KEY_OUTPUT)

It is possible for the predicate in the head of an aggregation rule to have arguments other than the one computed by the aggregation. Such variables provide a functionality reminiscent of GROUP BY in SQL. For example, imagine that the input data source contains information about the employees in each department: worksIn(employee, department), and you wish to report the count of employees in each department, and not just the count for the company as a whole. You might write your aggregation as follows:

Example 12.7. A per-group count aggregation

worksIn(employee, department) -> string(employee), string(department).

result[d] = x <- agg<<x = count()>> worksIn(_, d).

Note how the variable d acted to indicate that the counting was to be on a per-department basis. Also, we did not have to explicitly provide a named variable for the employee role, but used an anonymous variable instead.

This grouping approach also works with additional arguments and with the other aggregation functions, as demonstrated in the following example:

Example 12.8. A multi-argument total aggregation

Item(i), hasItemCode(i:c) -> string(c).
Region(r), hasRegionName(r:rn) -> string(rn).
Quarter(q), hasQuarterNr(q:qn) -> int(qn).
nrSoldOf_In_In_[i, r, q] = n -> Item(i), Region(r), Quarter(q), int(n).

// Total number sold for each item-region combination:
totalNrSoldOf_In_[i, r] = n -> string(i), string(r), int(n).
totalNrSoldOf_In_[i, r] = n
   <- agg<<n = total(qty)>> nrSoldOf_In_In_[i, r, _] = qty.